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CHAPTER 1 
INTRODUCTION TO RELIABILITY-CENTERED MAINTENANCE 



1-1. Purpose 

The purpose of this technical manual is provide facility managers with the information and procedures 
necessary to develop and update a preventive maintenance (PM) program for their facilities that is based 
on the reliability characteristics of equipment and components and cost. Such a PM program will help to 
achieve the highest possible level of facility availability at the minimum cost. 

1-2. Scope 

The information in this manual reflects the commercial practices and lessons learned over many years of 
developing cost-effective preventive maintenance programs for a wide variety of systems and equipment. 
It specifically focuses on developing PM programs for electrical and mechanical systems used in 
command, control, communications, computer, intelligence, surveillance, and reconnaissance (C4ISR) 
facilities based on the reliability characteristics of those systems and economic considerations, while 
ensuring that safety is not compromised. The process for developing such a PM program is called 
Reliability-Centered Maintenance, or RCM. Two appendices develop key topics more deeply: appendix 
B, statistical distribution; and appendix C, availability. 

1-3. References 

Appendix A contains a complete list of references used in this manual. 

1-4. Availability, maintenance, and reliability 

In addition to the following key terms, the glossary lists acronyms, abbreviations, and additional 
definitions for terms used in this document. Additional terms are included to help the reader better 
understand the concepts presented herein. 

a. Availability. (Also see appendix C). Availability is defined as the instantaneous probability that a 
system or component will be available to perform its intended mission or function when called upon to do 
so at any point in time. It can be measured in one of several ways. 



(1) Operational availability(A ). Another equation for availability directly uses parameters related 
to the reliability and maintainability characteristics of the item as well as the support system. Equation 1 
reflects this measure. 

Mean Time Between Maintenance (MTBM) Equation 1 

Mean Downtime + MTBM 

(2) Inherent availability (AJ. In equation 1, MTBM includes all maintenance required for any 
reason, including repairs of actual design failures, repairs of induced failures, cases where a failure cannot 
be confirmed, and preventive maintenance. When only maintenance required to correct design failures 
are counted and the effects of the support system are ignored, the result is inherent availability, which is 
given by equation 2. 
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Mean Time Between Failure (MTBF) Equation 2 

1 Mean Time to Repair + MTBF 

b. Maintenance. Maintenance is defined as those activities and actions that directly retain the proper 
operation of an item or restore that operation when it is interrupted by failure or some other anomaly. 
(Within the context of RCM, proper operation of an item means that the item can perform its intended 
function.) These activities and actions include removal and replacement of failed items, repair of failed 
items, lubrication, servicing (includes replenishment of consumables such as fuel), and calibrations. 
Other activities and resources are needed to support maintenance. These include spares, procedures, 
labor, training, transportation, facilities, and test equipment. These activities and resources are usually 
referred to as logistics. Although some organizations may define maintenance to include logistics, it will 
be used in this TM in the more limited sense and will not include logistics. 

(1) Corrective maintenance. Corrective maintenance is maintenance required to restore a failed 
item to a specified condition. Restoration is accomplished by removing the failed item and replacing it 
with a new item, or by fixing the item by removing and replacing internal components or by some other 
repair action. 

(2) Preventive maintenance. Scheduled maintenance or maintenance performed based on the 
condition of an item conducted to ensure safety, reduce the likelihood of operational failures, and obtain 
as much useful life as possible from an item. 

(3) Condition-based maintenance. Condition-based maintenance can be performed on the basis of 
observed wear or on predicting when the risk of failure is excessive. 

(a) Some items exhibit wear as they are used. If the probability of failure can be related to a 
measurable amount of wear, it may be possible to prescribe how much wear can be tolerated before the 
probability of failure reaches some unacceptable level. If so, then this point becomes the criterion for 
removal or overhaul. Measurement can be done using a variety of techniques depending on the 
characteristic being measured. The length of cracks in structures, for example, can be measured using x- 
ray and ultrasound. 

(b) In predictive maintenance, a given operating characteristic of the item, vibration or 
temperature, for example, is trended and compared with the known "normal" operating levels. An 
acceptable range is established with either upper and lower limits, or some maximum or minimum level. 
As long as the trend data remain inside the acceptable level, any variation is considered to be normal 
variation due to variances in materials, operating environment, and so forth. When the trend line 
intersects the "unacceptable" limit line, preventive maintenance is required to prevent a failure in the 
future. The limits are based on knowledge of the normal operating characteristics and the level of risk of 
failure we are willing to accept. 

c. Reliability. Reliability is defined as the probability that a component can perform its intended 
function for a specified time interval (t) under stated conditions. 

d. Reliability-centered maintenance (RCM). RCM is a logical, structured framework for determining 
the optimum mix of applicable and effective maintenance activities needed to sustain the operational 
reliability of systems and equipment while ensuring their safe and economical operation and support. 
Although RCM focuses on identifying preventive maintenance actions, corrective actions are identified 
by default. That is, when no preventive action is effective or applicable for a given item, that item is run 
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to failure (assuming safety is not at issue). From that perspective, RCM identifies all maintenance. RCM 
is focused on optimizing readiness, availability, and sustainment through effective and economical 
maintenance. 

1-5. The reliability-centered maintenance concept 

Prior to the development of the RCM methodology, it was widely believed that everything had a "right" 
time for some form of preventive maintenance (PM), usually replacement or overhaul. A widespread 
belief among many maintenance personnel was that by replacing parts of a product or overhauling the 
product (or reparable portions thereof), that the frequency of failures during operation could be reduced. 
Despite this previous commonly held view, the results seemed to tell a different story. In far too many 
instances, PM seemed to have no beneficial effects. Indeed, in many cases, PM actually made things 
worse by providing more opportunity for maintenance-induced failures. 

a. Airline study. When the airline companies in the United States observed that PM did not always 
reduce the probability of failure and that some items did not seem to benefit in any way from PM, they 
formed a task force with the Federal Aviation Administration (FAA) to study the subject of preventive 
maintenance. The results of the study confirmed that PM was effective only for items having a certain 
pattern of failures. The study also concluded that PM should be required only when required to assure 
safe operation. Otherwise, the decision to do or not do PM should be based on economics. 

b. RCM approach. The RCM approach provides a logical way of determining if PM makes sense for a 
given item and, if so, selecting the appropriate type of PM. The approach is based on the following 
precepts. 

(1) The objective of maintenance is to preserve an item's function(s). RCM seeks to preserve system 
or equipment function, not just operability for operability's sake. Redundancy improves functional 
reliability but increases life cycle cost in terms of procurement and life cycle cost. 

(2) RCM focuses on the end system. RCM is more concerned on maintaining system function than 
individual component function. 

(3) Reliability is the basis for decisions. The failure characteristics of the item in question must be 
understood to determine the efficacy of preventive maintenance. RCM is not overly concerned with 
simple failure rate; it seeks to know the conditional probability of failure at specific ages (the probability 
that failure will occur in each given operating age bracket). 

(4) RCM is driven first by safety and then economics. Safety must always be preserved. When 
safety is not an issue, preventive maintenance must be justified on economic grounds. 

(5) RCM acknowledges design limitations. Maintenance cannot improve the inherent reliability - it 
is dictated by design. Maintenance, at best, can sustain the design level of reliability over the life of an 
item. 

(6) RCM is a continuing process. The difference between the perceived and actual design life and 
failure characteristics is addressed through age (or life) exploration. 

c. RCM concept. The RCM concept has completely changed the way in which PM is viewed. It is 
now a widely accepted fact that not all items benefit from PM. Moreover, even when PM would be 
effective, it is often less expensive (in all senses of that word) to allow an item to "run to failure" rather 
than to do PM. In the succeeding discussions, we will examine the RCM concept in more detail. We will 
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explore the meaning of terms that are central to the RCM approach. These terms include failure 
characteristics, efficiency, run to failure, cost, and function. 

1-6. Benefits of RCM 

a. Reduced costs. A significant reason for creating the aforementioned joint airline/FAA task force was 
the new Boeing 747 (B747) jumbo jet. Boeing and United Airlines, the initial buyer of the aircraft, were 
already considering the development of the PM program for the B747. This new airliner was vastly larger 
and more complex than any ever built. Given the cost of maintenance on smaller aircraft already in 
service, the maintenance costs for the B747, using the traditional approach to PM, would have threatened 
the profitability, and hence the viability, of operating the new aircraft. Examples of the ultimate savings 
achieved in using RCM to develop the PM program for the B747 and other aircraft are shown in table 1-1. 
Similar savings have been achieved by other industries for other equipment when going from a traditional 
to an RCM-based PM program. It is important to note that these costs savings are achieved with no 
reduction in safety, an obvious requirement in the airline industry. 



Table 



1-1. Cost benefits of using RCM for developing PM program 



Type of PM 


Required Using 
Traditional Approach 


Required Using RCM 


Structural inspections 


4,000,000 hours for DC- 
8 


66,000 hours for B747 


Overhaul 


339 items for DC-8 


7 items for DC-10 


Overhaul of turbine 
engine 


Scheduled 


On-condition (cut shop 
maintenance costs by 50% 
compared with DC-8) 



b. Increased availability. For many systems, including C4ISR facilities, availability is of primary 
importance. Availability was defined in paragraph 1-4. As indicated in the definition, the level of 
availability achieved in actual use of a product is a function of how often it fails and how quickly it can be 
restored to operation. The latter, in turn, is a function of how well the product was designed to be 
maintainable, the amount of PM required, and the logistics resources and infrastructure that have been put 
in place to support the product. RCM directly contributes to availability by reducing PM to that which is 
essential and economic. 

1-7. Origins of RCM 

a. Airlines. As stated earlier, RCM had its origins with the airline industry. Nowhere had the then- 
prevailing philosophy of maintenance been challenged more. By the late 1950's, maintenance costs in the 
industry had increased to a point where they had become intolerable. Meanwhile, the Federal Aviation 
Agency (FAA) had learned through experience that the failure rate of certain types of engines could not 
be controlled by changing either the frequency or the content of scheduled fixed-interval overhauls. As a 
result of these two factors, a task force consisting of representatives of the airlines and aircraft 
manufacturers was formed in 1960 to study the effectiveness of PM as being implemented within the 
airline industry. 

(1) The task force. The task force developed a rudimentary technique for developing a PM program. 
Subsequently, a maintenance steering group (MSG) was formed to manage the development of the PM 
program for the new Boeing 747 (B747) jumbo jet. This new airliner was vastly larger and more complex 
than any ever built. Given the cost of maintenance on smaller aircraft already in service, the maintenance 



1-4 



TM 5-698-2 



costs for the B747, using the traditional approach to PM, would have threatened the profitability, and 
hence the viability, of operating the new aircraft. 

(2) MSG-1. The PM program developed by the steering group, documented in a report known as 
MSG-1, was very successful. That is, it resulted in an affordable PM program that ensured the safe and 
profitable operation of the aircraft. 

(3) MSG-2. The FAA was so impressed with MSG-1 that they requested that the logic of the new 
approach be generalized, so that it could be applied to other aircraft. So in 1970, MSG-2, Airline 
Manufacturer Maintenance Program Planning Document, was issued. MSG-2 defined and standardized 
the logic for developing an effective and economical maintenance program. MSG-2 was first used on the 
L1011, DC10, and MD80 aircraft. In 1972, the European aviation industries issued EMSG (European 
Maintenance System Guide), which improved on MSG-2 in the structures and zonal analysis. EMSG was 
used on the Concorde and A300 Airbus. 

b. Adoption by military. The problems that the airlines and FAA had experienced with the traditional 
approach to maintenance were also affecting the military. Although profit was not an objective common 
to both the airlines and military, controlling costs and maximizing the availability of their aircraft were. 
Consequently, in 1978, the DOD contracted with United Airlines to conduct a study into efficient 
maintenance programs. The study supplemented MSG-2 by emphasizing the detection of hidden failures 
and moved from a process-oriented concept to a task-oriented concept. The product of the study was 
MSG-3, a decision logic that was called Reliability-Centered Maintenance (RCM). 

c. Use for facilities and other industries. Although created by the aviation industry, RCM quickly 
found applications in many other industries. RCM is used to develop PM programs for public utility 
plants, especially nuclear power plants, railroads, processing plants, and manufacturing plants. It is no 
overstatement to say that RCM is now the pre-eminent method for evaluating and developing a 
comprehensive maintenance program for an item. Today, a variety of documents are available on RCM. 
A listing of some of the more prominent documents is included in appendix A. 

1-8. Relationship of RCM to other disciplines 

a. Reliability. It is obvious why the first word in the title of the MSG-3 approach is reliability. Much 
of the analysis needed for reliability provides inputs necessary for performing an RCM analysis, as will 
be seen in succeeding sections. The fundamental requirement of the RCM approach is to understand the 
failure characteristics of an item. As used herein, failure characteristics include the underlying failure 
rate, the consequences of failure, and whether or not the failure manifests itself and, if it does, how. 
Reliability is measured in different ways, depending on one's perspective: inherent reliability, operational 
reliability, mission (or functional) reliability, and basic (or logistics) reliability. RCM is related to 
operational reliability. 

(1) Inherent versus operational reliability. From a designer's perspective, reliability is measured by 
"counting" only those failures that are design-related. When measured in this way, reliability is referred 
to as "inherent reliability." From a user's or operator's perspective, all events that cause the system to stop 
performing its intended function is a failure event. These events certainly include all design-related 
failures that affect the systems' function. Also included are maintenance-induced failures, no-defect 
found events, and other anomalies that may have been outside the designer's contractual responsibility or 
technical control. This type of reliability is called "operational reliability." 

(2) Mission or functional reliability versus basic or logistics reliability. Any failure that causes the 
product to fail to perform its function or mission is counted in "mission reliability." Redundancy 
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improves mission reliability. Consider a case where one part of a product has two elements in parallel 
where only one is needed (redundant). If a failure of one element of the redundant part of the product 
fails, the other continues to function allowing the product to do its job. Only if both elements fail will a 
mission failure occur. In "basic" reliability, all failures are counted, whether or not a mission or 
functional failure has occurred. This measure of reliability reflects the total demand that will eventually 
be placed on maintenance and logistics. 

b. Safety. Earlier, it was stated that one of the precepts on which the RCM approach is that safety must 
always be preserved. Given that the RCM concept came out of the airline industry, this emphasis on 
ensuring safety should come as no surprise. In later sections, the manner in which the RCM logic ensures 
that safety is ensured will be discussed. For now, it is sufficient to note that the RCM specifically 
addresses safety and is intended to ensure that safety is never compromised. In the past several years, 
environmental concerns and issues involving regulatory bodies have been accorded an importance in the 
RCM approach for some items that is equal (or nearly so) to safety. Failures of an item that can cause 
damage to the environment or which result in some Federal or state law being violated can pose serious 
consequences for the operator of the item. So the RCM logic is often modified, as it is in this TM, to 
specifically address environmental, mission, or other concerns. 

c. Maintainability. RCM is a method for prescribing PM that is effective and economical. Whether or 
not a given PM task is effective depends on the reliability characteristics of the item in question. Whether 
or not a task is economical depends on many factors, including how easily the PM tasks can be 
performed. Ease of maintenance, corrective or preventive, is a function of how well the system has been 
designed to be maintainable. This aspect of design is called maintainability. Providing ease of access, 
placing items requiring PM where they can be easily removed, providing means of inspection, designing 
to reduce the possibility of maintenance-induced failures, and other design criteria determine the 
maintainability of a system. 
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CHAPTER 2 
ESSENTIAL ELEMENTS OF A SUCCESSFUL RCM PROGRAM 

2-1. RCM implementation plan 

An overview of steps of the RCM process is shown in figure 2-1. 
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Figure 2-1. The RCM process starts in the design phase and continues for the life of the system. 
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a. Major tasks. As shown in figure 2-1, several major tasks are required to implement the RCM 
concept. 

• Define the System - Identify and document the boundaries of the analysis 

Identify and document equipment included in the analysis 

Identify and document the indenture level the analysis is intended to extend to 

• Define Ground Rules and Assumptions - Identify and document ground rules and assumptions 
used to conduct the analysis 

• Construct Equipment Tree - Construct equipment block diagrams to indicate equipment 
configuration, down to the lowest indenture level intended to be covered by the analysis 

• Conduct FMECA - Analyze failure modes, effects and criticality 

• Assign Maintenance Focus Levels - Classify maintenance focus levels based on criticality 
rankings 

• Apply RCM Decision Logic - Apply RCM logic trees for items, especially those identified as 
being critical (see figure 5-2) 

• Identify Maintenance Tasks - Identify maintenance tasks to be performed on the given item 

• Package Maintenance Program - Develop a maintenance tasking schedule for the analyzed 
equipment 

Note: RCM Analysis is intended to be a living analysis. Effort should be made to continue to collect more 
complete information and add it to the analysis, to continue to provide a foundation for effective 
continuous improvement. Results and recommendations should be periodically reviewed and re- 
evaluated, taking into consideration additional information of any kind. 

(1) Conduct supporting analyses. RCM is a relatively information-intensive process. To provide 
the information needed to conduct the RCM analysis, several supporting analyses are either required, 
often as prerequisites to beginning the RCM analysis, or desirable. These supporting analyses include the 
Failure Modes and Effects Analysis, Fault Tree Analysis, functional analysis, and others. 

(2) Conduct the RCM analysis. The RCM analysis consists of using a logic tree (see figure 5-2) to 
identify effective, economical, and, when safety is concerned, required PM. (As will be seen, PM is 
required when safety is involved; if no PM is effective, then redesign is mandatory). 

b. The implementation plan. Planning to implement an RCM approach to defining the PM for a 
system or product must address each of the tasks noted in the preceding paragraph. The plan must 
address the supporting design phase analyses needed to conduct an RCM analysis. Based on the analysis, 
an initial maintenance plan, consisting of the identified PM with all other maintenance being corrective, 
by default, is developed. This initial plan should be updated through Life Exploration during which 
initial analytical results concerning frequency of failure occurrence, effects of failure, costs of repair, etc. 
are modified based on actual operating and maintenance experience. Thus, the RCM process is iterative, 
with field experience being used to improve upon analytical projections. 
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2-2. Data collection requirements 

a. Required data. Since conducting an RCM analysis requires an extensive amount of information, 
and much of this information is not available early in the design phase, RCM analysis for a new product 
cannot be completed until just prior to production. The data falls into four categories: failure 
characteristics, failure effects, costs, and maintenance capabilities and procedures. 

(1) Failure characteristics. Studies conducted by the MSGs and confirmed by later studies showed 
that PM was effective only for certain underlying probability distributions. Components and items, for 
example, for which a constant failure rate applies (e.g., the underlying probability distribution is the 
exponential) do not benefit from PM. Only when there is an increasing probability of failure should PM 
be considered. 

(2) Failure effects. The effects of failure of some items are minor or even insignificant. The 
decision whether or not to use PM for such items is based purely on costs. If it is less expensive to allow 
the item to fail, and to perform CM, than it is to perform PM, then the item is allowed to fail. As stated 
earlier, allowing an item to fail is called run to failure. 

(3) Costs. The costs that must be considered are the costs of performing a PM task(s) for a given 
item, the cost of performing CM for that item, and the economic penalties, if any, when an operational 
failure occurs. 

(4) Maintenance capabilities and procedures. Before selecting certain maintenance tasks, the 
analyst needs to understand what the capabilities are, or are planned, for the system. In other words, what 
is or will be the available skill levels, what maintenance tools are available or are planned, and what are 
the diagnostics being designed into or for the system. 

b. Sources of data. Table 2-1 lists some of the sources of data for the RCM analysis. The data 
elements from the Failure Modes and Effects Analysis (FMEA) that are applicable to RCM analysis are 
highlighted in paragraph 5- 5b. Note that when RCM is being applied to a product already in use, or when 
a maintenance program is updated during Life Exploration , historical maintenance and failure data will 
be inputs for the analysis. An effective Failure Reporting and Corrective Action System (FRACAS) is an 
invaluable source of data. 
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Table 2-1. Data sources for the RCM analysis 



Data Source 


Comment 


Lubrication requirements 


Determined by designer. For off-the-shelf items being integrated into 
the product, lubrication requirements and instructions may be available. 


Repair manuals 


For off-the-shelf items being integrated into the product. 


Engineering drawings 


For new and off-the-shelf items being integrated into the product. 


Repair parts lists 




Quality deficiency reports 


For off-the-shelf items being integrated into the product. 


Other technical documentation 


For new and off-the-shelf items being integrated into the product. 


PREP Database 


For new and off-the-shelf items being integrated into the product. 


Recorded observations 


From test of new items and field use of off-the-shelf items being 
integrated into the product. 


Hardware block diagrams 


For new and off-the-shelf items being integrated into the product. 


Bill of Materials 


For new and off-the-shelf items being integrated into the product. 


Functional block diagrams 


For new and off-the-shelf items being integrated into the product. 


Existing maintenance plans 


For off-the-shelf items being integrated into the product. Also may be 
useful if the new product is a small evolutionary improvement of a 
previous product. 


Maintenance technical 
orders/manuals 


For off-the-shelf items being integrated into the product. 


Discussions with maintenance 
personnel and field operators 


For off-the-shelf items being integrated into the product. Also may be 
useful if the new product is a small evolutionary improvement of a 
previous product. 


Results of FMEA, FTA, and 
other reliability analyses 


For new and off-the-shelf items being integrated into the product. 
Results may not be readily available for the latter. 


Results of Maintenance task 
analysis 


For new and off-the-shelf items being integrated into the product. 
Results may not be readily available for the latter. 



2-3. Data analysis 

Data can be considered the lifeblood of RCM. The data from the sources listed in table 2-1 is used in 
several ways. Data provides the basis for determining the failure characteristics of items. It is also used 
to evaluate the effectiveness of specific PM tasks used on past systems. Economic data provides the basis 
for determining whether PM is more economical than running an item to failure (only done when safety is 
not affected). 

2-4. Commitment to life cycle support of the program 

a. The Process Perspective. As will be shown in this section, RCM must be viewed as a continuing 
process, rather than an event that occurs once. Although a maintenance program based on RCM should 
be developed during design, it should be refined throughout the operational life of the system. In 
addition, RCM can be used to develop a maintenance program for an existing system for which the initial 
maintenance program was not based on RCM 

b. Learning from Experience. Much of the information used to develop an RCM program, either 
during design for a new system or after fielding for an existing system will be based on estimates, may 
change over time, or be subject to some combination of these two factors. Consequently, it is essential to 
use experiential data to update the maintenance program. 
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2-5. RCM as a part of design 

It is ideal to implement an RCM approach during the design and development of a new system to develop 
a maintenance program. The reasons will be briefly discussed here but will become clearer as the reader 
proceeds through the remaining sections of this TM. 

a. Effective use of analyses. During design and development, numerous analyses are performed. 
Many of these analyses directly support an RCM analysis. In turn, the results of going through the RCM 
process of developing a maintenance program can affect and contribute to these analyses. Obviously, 
implementing RCM during design and development makes very effective use of analyses that are usually 
performed. 

b. Impact on design. As will be seen when the RCM logic diagrams are discussed, redesign is either 
mandatory or desirable in many cases. The cost and level of effort of design changes made during the 
design and development phase of a system are much less than if they were made after the system was 
fielded. Additionally, the effectiveness of design changes is higher when made during the design and 
development phase. Of course, RCM can and is used to develop maintenance programs for fielded 
systems, for which RCM was not applied during design and development. However, it is always best to 
implement RCM during design and development. 

2-6. Focus on the four Ws 

Discussion of the four Ws: what can fail, why does it fail, when will it fail, and what are the 
consequences of failures. 

a. What can fail? In determining required maintenance, the first and most fundamental question that 
must be answered is what can fail. A variety of methods can be used to answer this question. 

(1) Analytical methods. Failure Modes and Effects Analysis, Fault Tree Analysis, and relayed 
analyses address, among other issues, what can fail that will prevent a system, subsystem, or component 
from performing its function(s). 

(2) Test. Analytical methods are not infallible and a particular failure may be overlooked or cannot 
be anticipated by analysis. Testing often reveals these failures. Testing can, of course, also be used to 
confirm or validate the results of analytical methods. 

(3) Field experience. Often, the same type of component, assembly, or even subsystem that is 
already used in one system may be used in a new system. If data is collected on field performance of 
these components, assemblies, and subsystems, it can be used to help answer the question, what can fail. 
Obviously, field experience is equally applicable to RCM when applied to an already fielded system. 

b. Why does an item fail? To determine which, if any preventive maintenance tasks are appropriate, 
the reason for failure must be known. Insights into the modes and mechanisms of failure can be gained 
through analysis, test, and past experience. Some of the analytical methods are the same as those used to 
determine What Can Fail. The methods include the FMEA and FTA. Others include root cause analysis, 
destructive physical analysis, and non-destructive inspection techniques. Table 2-2 lists some non- 
destructive inspection (NDI) techniques and table 2-3 lists some of the modes and mechanisms of failure. 
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Table 2-2. Non-destructive inspection (NDI) techniques, briefly 



Acoustic emission 


Magnetic particle examination 


Dye penetrant 


Radiography 


Eddy current 


Spectrometric oil analysis 


Emission spectroscopy 


Stroboscopy 


Ferrography 


Thermography 


Leak testing 


Ultrasonics 



Table 2-3. Examples of failure mechanisms and modes 



Modes 


Stuck open (valve) 


Fractured (shaft) 


Wear (bearing) 


Shorted (connector) 


Leakage (seal) 


Slippage (belt drive) 


Low torque (motor) 


Excessive friction (shaft journal) 


Short (resistor) 


Mechanisms 


Brinelling (bearing ring) 


Spalling (concrete) 


Elongation/yielding (structure) 


Fretting (pump shaft) 


Condensation (circuit board) 


Freezing (battery) 


Ionization (microcircuit) 


Glazing (clutch plate) 


Fatigue (springs) 


Plastic deformation (springs) 


Wear (clutch plate) 


Galvanic corrosion (structure) 



c. When will an item fail? (Occurrence) If the underlying time to failure distribution is known for a 
part or assembly, then the probability of failure at any point in time can be predicted. For some items, the 
underlying distribution is exponential and the item exhibits a constant failure rate. In such cases, a new 
item used to replace an old item has exactly the same probability of failing in the next instant of time as 
did the old item. Consequently, changing such an item at some prescribed interval has no effect on the 
probability of failure. It makes more sense to run the item to failure. If that is not possible, if safety is 
involved for example, then redesign is necessary. As shown in figure 2-2, only a small percentage of 
items can benefit from PM. Knowing the underlying distribution of times to failure is essential in 
determining if PM is applicable. 
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Bath tub 



Age limit may be 
desirable 



Wearout 



Gradual rise, no 
distinct wearout point 



Initial increase, level 
off 



_^r 



Uniform (constant failure) 



Infant mortality 




89% cannot 
benefit from 
limit on 
operating age 



Figure 2-2. Applicability of age limit depending on failure pattern 

d. What are the consequences of the item failing? (Severity) Not all failures are equal in their effect on 
the system. Obviously, any failures that can cause death or injury to system operators or maintainers, or 
others who may be served by the system (e.g., airline passengers) or are nearby are the most serious. 
Very close in seriousness are failures that can result in compromised mission requirements, pollution to 
the environment, or a violation of government statutes. At the bottom of the list are failures such as 
cosmetic damage and other problems that have no effect on system operation. Knowing the effect of a 
failure helps prioritize decisions. Serious failures usually demand some form of PM or redesign is 
necessary. Minor failures usually do not lead to redesign and PM is performed only if it is less expensive 
than running the item to failure. Table 2-4, on the following page, lists some examples of failure effect 
categorization used in FMEAs and in the RCM process. The manner in which failure effects are 
categorized for C4ISR facilities should be based on the functions of the facility. Obviously, any failure 
that could kill or injure personnel or cause loss of the C4ISR mission would have to be categorized as the 
most serious. The criteria shown in table 2-4 or some combination could be the basis for a C4ISR 
facility-specific categorization approach. Note that in using the RCM approach to developing a PM 
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program, all failure must be put into one of three categories (Preventative Maintenance, Predictive 
Maintenance, Corrective Maintenance). These categories are used in the logic trees. 



Table 2-4. Examples of failure effect categorization 



AIAG Standard (Automobile Industry Standard) 


Effect 


Severity of Effect 


Ranking 


Hazardous 

without 

warning 


Very high severity ranking when a potential failure mode affects 
safe system operation and/or involves non compliance with 
federal safety regulation without warning 


10 


Hazardous 

with 

warning 


Very high severity ranking when a potential failure mode affects 
safe system operation and/or involves non compliance with 
federal safety regulation warning 


9 


Very High 


System/item inoperable with loss of primary function 


8 


High 


System/item operable, but at reduced performance level. User 
dissatisfied 


7 


Moderate 


System/item operable, but comfort/convenience item inoperable 


6 


Low 


System/item operable, but comfort/convenience item operable at 
reduced level 


5 


Very Low 


Defect noticed by most customers 


4 


Minor 


Defect noticed by average customer 


3 


Very Minor 


Defect notice by discriminating customer 


2 


None 


No effect 


1 


Example of a Simplified Categorization 


Critical 


Death, loss of system, violation of governmental statute 


High 


Injury, loss of some system functions, very high economic loss 


Moderate 


Damage to system requiring maintenance at first opportunity, economic loss 


Low 


Minor damage to system, low economic loss 


Negligible 


Cosmetic damage, no economic loss 


RCM Analysis 


Safety 


Directly and adversely affects on operating safety 


Operational 


Prevents the end system from completing a mission 


Economic 


Does not adversely affect safety and does not adversely affect operations - the only 
effect is the cost to repair the failure 
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CHAPTER 3 



MAINTENANCE OF SYSTEMS 



3-1. Introduction 

Maintenance is defined as those activities and actions that directly retain the proper operation of an item 
or restore that operation when it is interrupted by failure or some other anomaly. (Within the context of 
RCM, proper operation of an item means that the item can perform its intended function). These 
activities and actions include fault detection, fault isolation, removal and replacement of failed items, 
repair of failed items, lubrication, servicing (includes replenishment of consumables such as fuel), and 
calibrations. Other activities and resources are needed to support maintenance. These include spares, 
procedures, labor, training, transportation, facilities, and test equipment. These activities and resources 
are usually referred to as logistics. Although some organizations may define maintenance to include 
logistics, it will be used in this document in the more limited sense and will not include logistics. 

3-2. Categories of maintenance 

Maintenance is usually categorized by either when the work is performed or where the work is performed. 

a. Categorizing by when maintenance is performed. In this case, maintenance is divided into two 
major categories: preventive and corrective. Figure 3-1 illustrates how these two categories are further 
broken down into specific tasks. These categories of maintenance, corrective and preventive, are further 
subdivided in some references into reactive, preventive, predictive, and proactive maintenance. 



MAINTENANCE 
I 



Required by: 

• Safety 

• Condition 

• Servicing 



Preventive (or Scheduled) 
Maintenance (PM) 



Time Replacement 



Corrective (or Unscheduled) 
Maintenance (CM) 



Required by: 

• Confirmed failures 

• Unconfirmed failures* 



Condition 
Monitoring 



Repair 



Remove & Replace 



Calibration & 
Adjustment 



Common PM Actions 



Gain access 

Perform PM 

Confirm functionality 

Close up and secure 



Cleaning & 
Lubrication 



Common CM Actions 

Gain access 

Fault isolation 

Perform CM 

Confirm fault corrected 

Close up and secure 



Unconfirmed failures result from false alarms in the built-in test, intermittent failures, or test equipment failures. 
Unconfirmed failures will trigger some unscheduled maintenance actions, ranging from confirming no fault exists 
(attributed to false alarm or Cannot Duplicate) to removing and replacing the item only to later find (at another level of 
maintenance) that the item is good (Retest OK). 



Figure 3-1. Major categories of maintenance by when performed. 
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(1) Reactive maintenance. This term is equivalent to corrective maintenance and both are also 
referred to as breakdown, repair, fix-when-fail, or run-to-failure maintenance. 

(2) Proactive maintenance. Includes actions intended to extend useful life, such as root-cause 
failure analysis, continual improvement, and age exploration. Proactive and predictive are treated herein 
as categories of preventive maintenance, with proactive included under Scheduled, predictive under 
Condition-based (see paragraph 3-1), and age exploration as a separate step in the RCM process. 

b. Categorizing by where maintenance is performed. Maintenance can also be categorized by where 
the work is performed. These categories are referred to as levels of maintenance. The categories most 
often used are shown in figure 3-2. 



Tj 




a 




m 




ui 








Ol 




> 




aj 








Z3 


a 




< > 






on 


N 


e 


^ 


LO 


to 


ro 


< > 


OJ 


<ij 


U 


t/1 










>, 












ro 




3 




i/l 




o 





Line or 
Organizational 



Field or Shop 




Maintenance performed on the system or equipment at the 
site where the product is normally used or stored when not 
in use. 



Maintenance done on portions (e.g., subsystems, 
subassemblies, or components) of the system at or near the 
operating/storage location. In some cases, maintenance 
performed on the system itself is included when it involves 
"heavy" maintenance (structural repair, engine change, 
etc.). 



Maintenance done on the system or portions of the system 
at a remote, centralized facility. 



Figure 3-2. Typical approach to categorizing maintenance by where it is performed. 

3-3. Categorization by when maintenance is performed 

a. Preventive maintenance. Preventive maintenance (PM) is usually self-imposed downtime (although 
it can be done while corrective maintenance is being performed and it may even be possible to perform 
some PM while the product is operating). PM consists of actions intended to prolong the operational life 
of the equipment and keep the product safe to operate. This manual defines two types of PM: Scheduled 
and Condition-based. In both cases, the objectives of PM are to ensure safety, reduce the likelihood of 
operational failures, and obtain as much useful life as possible from an item. Table 3-1 has examples of 
each type of PM. 
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Table 3-1. Examples of tasks under two categories of preventive maintenance 



Category 


Tasks 


Examples 


Notes 


Scheduled : 


Remove 

and replace 

(R&R) 


R&R batteries in smoke alarm twice annually 


Maintenance is performed without 
regard to actual condition of item. 
Interval based on useful life and other 
factors. Includes all lubrication and 
servicing. 


R&R gun barrel after 5,000 rounds have been 
fired 


Change oil every 3,000 miles 


Lubricate bearings every 25,000 shaft 
revolutions 


Overhaul 

or 
recondition 


Overhaul transmission every 100,000 miles 


Item is overhauled or reconditioned 
without regard to actual condition. 
Interval based on useful life and other 
factors. 


Refinish blades every 2,000 operating hours 


Recalibrate 


Recalibrate depth setting on drill press daily 


Compensate for changes in 
calibration due to vibration and other 
conditions of use. 


Recalibrate gage against standard at 
beginning of each shift 


Condition 2 


Inspect 
item or area 


Visually inspect belts and pulleys for 
excessive wear prior to starting machine 


Inspections can be performed using 
human senses (e.g., visually check 
belts for wear), using non-destructive 
inspection (NDI) techniques (e.g., 
inspect for corrosion using dye 
penetrant), or special measuring 
equipment (check tread depth using 
gage). Can also include functional 
check to determine proper operation. 


Inspect for corrosion every 2 weeks 


Inspect for delamination or disbond weekly 


Inspect tires for cuts and proper tread depth 
before and after each flight 


Inspect for hidden failure of redundant item 


Monitor 
condition 


Continuously monitor vibration profile and 
R&R bearing when limits reached 


Objective is to take action before 
useful life has been reached or a 
functional failure has occurred. 
Parameter limits and profiles based 
on analysis, test, and field experience. 
Monitoring can but does not need to 
be continuous. 


Check sample of oil every 50 operating hours 
for presence of wear metals and overhaul 
engine when limits reached 



1. Based on time. 

2. Based on observed or measured condition. 

(1) Scheduled maintenance. When a specified interval between maintenance is required, the 
maintenance is referred to as scheduled preventive maintenance. The interval may be in terms of hours, 
cycles, rounds fired, or other measure meaningful to the manner in which the item is operated. Note that 
with scheduled PM, no attempt is made to ascertain the condition of the item. Scheduled maintenance 
may also consist of recalibrations or adjustments made at regular intervals. Some texts categorize 
inspections as scheduled PM. Certainly, inspections are based on some periodic interval or event (e.g., 
inspection of an aircraft prior to and after each flight). However, since the purpose of an inspection is to 
ascertain the condition of the item, we have chosen to include it under the next category of PM, 
Condition-based. 

(2) Condition-based maintenance. Preventive maintenance performed to ascertain the condition of 
an item, detect or forecast an impending failure, or performed as a result of such actions is referred to as 
Condition-based PM. 

(a) A hidden failure of an item is one that has already occurred, has not affected performance of 
the end system, but will if another item fails. Ideally, through some form of warnings or monitoring 
device, no failure will be "hidden." In reality, it is impractical and not always feasible to detect every 
failure of every item in a system and alert the operator or maintainer that the failure has occurred. 
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Inspections are therefore needed to detect such failures. See chapter 4-3a(3) for a more complete 
discussion of hidden failures. Maintenance that is required to correct a hidden failure condition is, of 
course, corrective maintenance. 

(b) Some texts use terms such as predictive maintenance and on-condition. The definition of 
condition-based PM used herein includes these concepts. In summary, the objectives of condition-based 
PM are to first evaluate the condition of an item, then, based on the condition, either determine if a hidden 
failure has occurred or a failure is imminent, and then take appropriate action. 

b. Corrective maintenance and run-to-failure. As already alluded to, corrective maintenance (CM) is 
required to restore a failed item to proper operation. 

(1) Restoration. Restoration is accomplished by removing the failed item and replacing it with a 
new item, or by fixing the item by removing and replacing internal components or by some other repair 
action. 

(2) When CM is required. CM can result from system failures or from condition-based PM. 

(a) When system operation is impaired by the failure of one or more items, the operator is usually 
and immediately alerted to the problem. This alert may come from obvious visual or sensory signals (i.e., 
the operator can see, hear, or feel that a problem has occurred) or from monitoring equipment (indicators, 
built-in diagnostics, annunciator lights, etc.). When the alert comes from the latter, it is possible that a 
system failure has in fact not occurred. That is, the detecting equipment itself has failed or a transient 
condition has occurred resulting in an indication of system failure that is false or cannot be duplicated. 
Whether or not an actual system failure has occurred, any indication that one has will necessitate CM. 
The CM may result in a Cannot Duplicate (CND) or Retest OK (RTOK), in-place repair, or replacement. 
CNDs and RTOKs are serious problems in very complex systems for two reasons. First, they consume 
maintenance time and can cause unnecessary loss of system availability. Second, without in-depth test 
and analysis, one cannot be certain whether the detecting equipment failed, the system did fail, or 
transients caused the failure (and is not evident except under those transient conditions). 

(b) When inspection or condition monitoring detects a hidden or failure, then some form of 
corrective maintenance is required. 

(c) If the only concern were to obtain the greatest possible amount of life from an item, it would 
be allowed to run-to-failure. Under a run-to-failure approach, only CM would be required. No PM would 
be performed. However, the consequences on economics, safety, and mission requirements of some 
failures make a run-to-failure approach untenable. Consequently, most practical maintenance programs 
consist of a combination of PM and CM. Determining what combination is "right" for an item is one of 
the objectives of the RCM process. 

3-4. Maintenance concepts 

a. Levels of maintenance. In considering how maintenance can be categorized, the idea of levels of 
maintenance was introduced. The term "levels of maintenance" has traditionally been used by the 
military services, although its use is not unknown in commercial industry. Within the services, the norm 
was once three levels of maintenance (line or organizational, field or shop, and depot). Under a 3-level 
concept, items are either repaired while installed on the end product or are removed and replaced. 
Various terms are used to refer to an item that is removed and replaced and include Line Replaceable Unit 
(LRU) and Weapon Replaceable Assembly (WRA). For convenience, LRU will be used in this document 
to refer to items that are normally removed from and replaced on the end product. 
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(1) The benefits of a 2-level maintenance concept. In an effort to reduce costs and increase 
availability, the services have been working for several years to implement a 2-level maintenance 
concept. Under this concept, repairs made on the system are kept to a minimum and, whenever possible, 
consist of remove and replace (R&R) actions. The idea is that by making R&R the preferred maintenance 
on the product, the downtime of the system can be kept to a minimum. Failed items are then sent back to 
the second level of maintenance, usually a depot or original equipment manufacturer (OEM). 

(2) Making a 2-level concept work. A 2-level maintenance concept will only be affordable and 
practical if three criteria are met. First, each LRU's reliability must be "sufficiently high" given the item's 
cost. If not, availability will suffer, due to an excessive number of high-cost spares failing, and the supply 
"pipeline" will be expensive. Second, the integrated diagnostic capability (Built-in Test, Automatic Test 
Equipment, manual methods, etc.) must be very accurate and reliable. Otherwise, the supply pipeline to 
the second level of maintenance will be filled with good LRUs mistakenly being sent for repair - CNDs 
and RTOKs are a serious problem under any maintenance concept but spell disaster for a 2-level 
maintenance concept. Finally, a responsive and cost-effective means of transporting LRUs between the 
field and the depot must be available. 

b. Centralized versus de-centralized. When maintenance at a given level is performed at several 
locations located relatively close to the end user, a decentralized maintenance concept is being 
implemented. For example, suppose a 3-level maintenance concept is being used. When an LRU fails at 
an operating location, it is removed and replaced with a good LRU. The operating location sends the 
failed LRU to a co-located field repair activity (FRA) where it is repaired. Such repair can consist of 
either in-place repair or R&R of constituent components often called Shop Replaceable Units (SRUs). 
Under a centralized concept, each operating location would not have a co-located FRA. Instead, one or 
more centralized FRAs would be strategically located throughout the geographic operating area (i.e., 
country, continent, hemisphere, etc.). Each operating location would ship its failed LRUs to the nearest 
centralized FRA. Such a concept is most effective when the LRUs are highly reliable. If the reliability is 
high, then few failures will occur at any given operating location making it difficult to keep the 
technicians proficient in repairing the LRUs. Also, with few failures, the technicians and any support 
equipment (e.g., automatic test equipment) will be under utilized. Under such conditions, it is difficult to 
justify a co-located FRA. 

3-5. Packaging a maintenance program 

The total maintenance requirements for a product will dictate a set of preventive maintenance (PM) tasks 
and a set of corrective maintenance (CM) tasks. The latter tasks are essentially "maintenance on demand" 
and by definition cannot be predicted. PM, as discussed previously, will consist of on-condition and 
scheduled maintenance. Once all PM tasks have been identified, they must be grouped, or packaged. By 
packaging PM tasks, we can use our maintenance resources more effectively and minimize the number of 
times that the system will be out of service for PM. 

a. Packaging example. An example is shown in figure 3-3. We could have conducted the pump 
inspection at 28 hours, the panel inspection at 22 hours, and lubricated the gearbox at 25 hours. But it is 
much more efficient to "package" the tasks as shown in the example. 
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PM Tasks Identified through RCM 

Inspect hydraulic pump every 28 operating hours (OH) for leaks 
Remove and replace the pulley belts every 150 OH 
Lubricate all moving mechanical parts in the gearbox every 25 OH 
Monitor vibration levels in the drive shaft and remove and replace when 
levels defined in the maintenance manual are exceeded 

Inspect access panels for loose or missing fasteners every 22 OH 



Other Inputs 

Maintenance staffing levels 
Operating concept 
Mission requirements 
Etc. 



Packaged PM Tasks 

• Conduct the following PM every 25 OH 

- Inspect hydraulic pump for leaks 

- Inspect access panels for loose or missing fasteners 

- Lubricate all moving mechanical parts in the gearbox 

• Remove and replace the pulley belt every 150 OH 

• Monitor vibration levels in the drive shaft and remove and replace 
when levels defined in the maintenance manual are exceeded 



Figure 3-3. An example of packaging PM tasks. 

b. Document the packaging for maintenance personnel. One method of documenting the packaging of 
PM tasks is to create inspection cards. For a given point in time (calendar time, number of operating 
hours, etc.), a set of cards defines the PM tasks to be performed. Figure 3-4 illustrates this approach. 
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500-Hour PM 



Card 4 of 4 



Item: 
Quantity: 



500-Hour PM 

Bearing assembly, BA32-19876 
One 



Card 3 of 4 



500-Hour PM 



Card 2 of 4 



Item: Accessory belt, AB1189-Z 

Quantity: One 

Task: Inspect for excessive wear 

Instructions: Open access panel AP-ADS by turning quick-disconnect fasteners counter- 

and by 
re with 
ites that 
Consult 
ing the 
connect 



25-Hour PM 

Item: Hydraulic Pump, Part number HP23145 

Quantity: One 

Task: Inspection and Lubrication Service 



Card 1 of 4 



Instructions: 

- Inspect hydraulic pump for leaks 

- Inspect access panels for loose or missing fasteners 

- Lubricate all moving mechanical parts in the gearbox 



other 
B456- 
rease 



sther 
456- 



Figure 3-4. Example of how PM cards can be used to document required PM tasks. 
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CHAPTER 4 

FUNDAMENTAL CONCEPTS OF A RELIABILITY-CENTERED 

MAINTENANCE PROGRAM 



4-1. Objectives of RCM 

This chapter provides a discussion of the two primary objectives of RCM: Ensure safety through 
preventive maintenance actions, and, when safety is not a concern, preserve functionality in the most 
economical manner. For C4ISR facilities, mission should be considered at the same level as safety. 

4-2. Applicability of preventive maintenance 

a. Effectiveness. PM can be effective only when there is a quantitative indication of an impending 
functional failure or indication of a hidden failure. That is, if reduced resistance to failure can be detected 
(potential failure) and there is a consistent or predictable interval between potential failure and functional 
failure, then PM is applicable. Condition monitoring has long been used to monitor operating parameters 
that have been shown to be dependable predictors of an impending failure. Age limit information can also 
be utilized to determine effectiveness of preventative maintenance efforts (see figure 2-2). Preventive 
maintenance (PM) is effective if a potential failure condition is definable or there is a quantitative 
indication of an impending failure. PM is generally effective only for items that wearout. It has no 
benefit for items that have a purely random pattern of failure (i.e., failures are exponentially distributed 
and the failure rate is constant - see appendix B for a discussion of statistical distributions). 
Consequently, we rarely, if ever, use a PM action for electronics, since electronics exhibit a random 
pattern of failures. Mechanical items, on the other hand, usually have a limited useful period of life and 
then begin to wearout. 

b. Economic viability. The costs incurred with any PM being considered for an item must be less than 
for running the item to failure. The failure may have operational or non-operational consequences. The 
costs to be included in such a comparison for these two failure consequences are Operational and Non- 
operational. 

(1) Operational. The cost of repair is defined in (2) following. The operational cost is defined as 
the indirect economic loss as a result of failure plus the direct cost of repair. An example of an 
operational cost is the revenue lost by an airline when a flight must be canceled and passengers booked 
another airline. For military organizations where profit is not an objective, an operational cost might be 
the cost of a second flight or mission. Sometimes, it may be difficult for a military organization to 
quantify an operational cost in terms of dollars and a subjective evaluation may be needed. 

(2) Non-operational. The non-operational cost is defined as the direct cost of repair. The direct cost 
of repair is the cost of labor, spare parts, and any other direct costs incurred as a result of repairing the 
failure (by removing and replacing the failed item or performing in-place repair of the item). 

c. Preservation of function. The purpose of RCM is not to prevent failures but to preserve functions. 
Many maintenance people who are unfamiliar with RCM initially find this idea difficult to accept. As 
was discussed in paragraph 1-4, for many years prior to and following World War II, the "modern" view 
within the maintenance community was that every effort should be made to prevent all failures. 
Preventing failure was the focus of every maintenance technician. But products became increasingly 
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complex and maintenance costs increased both in absolute terms and as a percentage of a product's total 
life cycle costs. It was soon clear that preventing all failures was technically and economically 
impractical. Instead, attention was turned to preserving all of the essential functions of a product. This 
shift from preventing failures to preserving function was fundamental to the development of the RCM 
approach to defining a maintenance program. 

4-3. Failure 

For RCM purposes, three types of failures are defined: functional, evident, and hidden. 
a. Types of failures. 

(1) Functional failure. A functional failure is one in which a function of the item is lost. A 
functional failure directly affects the mission of the system. To be able to determine that a functional 
failure has occurred, the required function(s) must be fully understood. As part of a Failure Modes and 
Effects Analysis (FMEA), all functions have been defined. This definition can be very complex for 
products that have varying levels of performance (e.g., full, degraded, and loss of function). 

(2) Evident failures. When the loss of a function can be observed or is made evident to the operator, 
the failure is said to be evident. In the latter case, dials or displays, audible or visual alarms, or other 
forms of instrumentation alert the operator to the failure. 

(3) Hidden failures. A hidden failure is a functional failure of an item that has occurred, has not 
affected performance of the end system, and is not evident to the operator, but will cause a functional 
failure of the end system if another item fails. In other words, because of redundancy or the nature of the 
item's function in the system, no single-point failure of the end system has occurred. If, on the other 
hand, multiple failures occur, then the system will fail to perform its function. A simple example is the 
system shown in figure 4-1. Either of the two redundant items, A and B, can perform a critical function. 
Redundancy was used because the function is critical and a single point failure was unacceptable. If 
either item A or B can fail without the knowledge of the operator, it is considered a hidden failure. The 
system would now be subject to a single point failure (i.e., the function can be lost by one more failure - 
the failure of the other redundant component). Hidden failures must be found by maintenance personnel. 



Input 



B 



Output 



Figure 4-1. Block diagram of a simple redundant system. 

b. Failure consequences. A basic objective of the RCM analysis is to make decisions regarding the 
selection of a maintenance action for a specific functional failure of a specific item based on the 
consequence of the failure. Three categories of failure consequences are generally used. They are safety, 
operational (mission), and economic. 

(1) Safety. If a functional failure directly has an adverse affect on operating safety, the failure effect 
is categorized as Safety. The functional failure must cause the effect by itself and not in combination 
with other failures. That is, the failure must be a single-point failure. (Note that a hidden failure for 
which no preventive maintenance is effective and which, in combination with another failure, would 
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adversely affect safety must be treated as a safety-related failure. The methodology is designed to address 
this situation). 

(2) Operational. When the failure does not adversely affect safety but prevents the end system from 
completing a mission, the failure is categorized as an Operational failure. For many end systems, 
operational failure results in loss of revenue. In other cases, a critical objective cannot be met. See table 
4-1 for examples. 

(a) An adverse effect on safety means that the result of the failure is extremely serious or 
catastrophic. Results can include property damage, injury to operators or other personnel, death, or some 
combination of these. 

(b) In some industries, this category is expanded to include failures that result in a federal statute 
being violated. An industry such as the petroleum or power industry often includes failures that would 
result in violations of the Environmental Protection Act. Other industries may include failures with other 
effects in this category. 

Table 4-1. Examples of effects of operational failures 



End System 


Effect of Operational Failure 


Airliner 


Airline must cancel flight and either send passengers to another airline or add a 
flight. In either case, revenue is adversely affected. 


Manufacturing 
equipment 


Production must be halted until repairs are made adversely affecting sales. Some 
orders may be canceled because delivery dates cannot be met (unless no other 
sources can provide the product to the customers - in that case, loss of customer 
confidence may result affecting future sales). 


Military aircraft 


Prolonged or lost conflict, inability to respond to a political crisis in a timely 
manner, or exposure to a period of vulnerability. 


Financial 
information system 


Loss of revenue due to an inability to make timely investments, penalties due to 
late payments, etc. 


C4ISR Facility 


Facility can not provide necessary electrical power to support an assigned 
mission. 



(3) Economic. When a functional failure does not adversely affect safety and does not adversely 
affect operations, then the failure is said to have an Economic effect. The only penalty of such a failure is 
the cost to repair the failure. 

4-4. Reliability modeling and analysis 

The following is a brief discussion of reliability modeling in general and the GO method, used for 
facilities such as C4ISR facilities. For an in-depth discussion, see TM 5-698-1. 

a. Reliability modeling. To evaluate the reliability characteristics of a system, and its constituent 
elements, a model is needed. Table 4-2 lists some of the methods most often used to model reliability. 



4-3 



TM 5-698-2 



Table 4-2. Methods for modeling reliability 



Method 


Comment 


Reliability Block 
Diagram 


A method of modeling that uses series and parallel connections to represent a 
system. The series connections represent opportunities for single point failures. 
Parallel connections represent redundancy. 


Fault Tree 


A top-down analysis useful for identifying multiple failure conditions, and the 
effect of human operation and maintenance on system failure. Useful for 
developing trouble-shooting procedures. 


Single Line 
Diagram 


Used for GO analysis (see paragraph 4-4b). 



(1) Reliability block diagram (RBD). Figure 4-2 is an example of an RBD. The system consists of 
five subsystems. Subsystems B, D, and E are all instances where one failure can cause the system to fail; 
i.e., each of these subsystems is like the link in a chain and if one fails, the "chain" fails. Subsystems A 
and C have redundancy. Subsystem A will fail to perform its system function only if both item 1 and 1A 
fail. Likewise, subsystem C will fail to perform its system function only if both item 3 and 3A fail. If the 
reliabilities of items 1, 1A, 2, 3, 3A, 4, and 5 are known, the reliability of the system can be calculated 
(see TM5-698-1). 
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Figure 4-2. Example of a reliability block diagram. 

(2) Fault tree. Figure 4-3 is an example of a fault tree developed for one type of failure in an 
elevator (passenger box falls free). 
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Figure 4-3. Example of a fault tree (from RAC Fault Tree Analysis Application Guide.) 
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b. The GO method. The GO software was originally designed to address the need of availability of 
nuclear facilities. The GO method, unlike fault tree analysis which focuses on a single system event and 
uses good/bad elements, is a comprehensive system analysis technique that addresses all system 
operational modes and is not restricted to two-state elements. GO is not a simulation package but a tool 
that utilizes the point estimates of component reliabilities to calculate desired system metrics. The GO 
procedure has been enhanced over the years to incorporate some special modeling considerations, such as 
system interactions and dependencies, as well as man-machine interactions. Key features of the GO 
method are listed in table 4-3. 

Table 4-3. Key features of the GO method 



• Models follow the normal process flow; 

• Most model elements have one-to-one correspondence with system elements; 

• Models accommodate component and system interactions and dependencies; 

• Models are compact and easy to validate; 

• Outputs represent all system success and failure states; 

• Models can be easily altered and updated; 

• Fault sets can be generated without altering the basic model; 

• System operational aspects can be incorporated; and 

• Numerical errors due to pruning are known and can be controlled. 



c. Single line diagram. The first step to performing an analysis with GO is to develop the one line 
drawing that represents the system. The single line diagram provides the analyst the path that must be 
modeled by GO to accurately represent the physical and logical equipment of the system. Figure 4-4 
represents a single line diagram of the IEEE Gold Book Standard Network System. 
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Figure 4-4. Example of a single line diagram (from IEEE Gold Book Standard Network). 
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CHAPTER 5 



THE RELIABILITY-CENTERED MAINTENANCE PROCESS 



5-1. Overview 

The overall RCM process was introduced in chapter 2 and is depicted in the process flow chart, figure 2- 
1. This chapter will describe in more detail how the process is implemented. 

5-2. C4ISR candidates for RCM analysis 



Is important to note from the onset that an RCM analysis is not beneficial for all products. The criteria 
listed in table 5-1 will help the analyst determine if an RCM analysis is potentially of value. There are 
three major systems comprising C4ISR facilities that are candidates for RCM analysis, mechanical 
systems, electrical systems, and control systems. All three combine to support the facilities mission and 
provide the necessary environmental conditions to maintain operation of critical equipment and 
personnel. All of the components shown in paragraph 5-2 are candidates for RCM optimization and 
require a maintenance program geared toward the mission requirement of the facility. 

Table 5-1. Criteria for applying RCM to products 



Criteria 


Comment 


Product has or is projected to 
have a large number of PM 
tasks. 


Existing product already in service or new system for which the PM 
tasks were identified using an approach other than RCM. 


Product maintenance costs are 
or are projected to be very 
high. 


Existing product already in service. PM tasks either identified using an 
approach other than RCM or RCM requires updating. New system for 
which maintenance tasks were identified using an approach other than 
RCM. 


Product requires or is 
projected to require frequent 
corrective maintenance. 


Existing product already in service. PM program may be inadequate; 
either identified using an approach other than RCM or RCM requires 
updating. New system for which maintenance tasks were identified 
using an approach other than RCM. 


Hazardous conditions could 
result from failure. 


New product, or existing product for which the PM tasks were 
identified using an approach other than RCM. 



a. Mechanical systems. The types of mechanical systems typical for a C4ISR facility include those 
shown in table 5-2. 





Table 5-2. 


Types 


of mechanical systems typical for a C4ISR facility 


• 


Chillers 




Boilers 


• 


Cooling towers 




HVAC distribution equipment including Fan Coil 


• 


Valves 




Units 


• 


Piping 







(1) Other systems. Mechanical systems also include generators, fuel oil delivery systems and 
storage and pumping components. These are critical to the mission of the facility but are frequently 
neglected. 
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(2) Temperatures. Mechanical systems not only maintain a comfortable environment for the 
occupants but are also designed to maintain optimal equipment operating temperatures. 

b. Electrical systems. Electrical systems begin at the transformer feeding the building or the 13.8 v 
feeder and continue through the entire distribution system generally to the panels containing the 208 or 
220/120-volt distribution. Some facility mission requirements require solutions all the way to the 
operating equipment at the wall outlet. Typical components comprising the electrical system include 
those shown in table 5-3. 



Table 5-3. 


Typical components 


comprising the C4ISR facility electrical system 


• Transformers, liquid filled and air • 


Motor Control Centers 


cooled 


• 


Motors 


Connections 


• 


Cable Connections 


Cables 


• 


UPS systems including Gel and Wet Cell Lead Acid 


Switch Gear 




Batteries 


Circuit Breakers 







c. Control systems. Control systems are the third major component making a C4ISR facility as reliable 
as possible. Control systems are the brains behind the operational characteristics during normal and 
abnormal conditions. Control systems are commonly identified as Supervisory Control and Data 
Acquisition (SCAD A) systems and are designed to monitor conditions and react in a manner to maintain 
a set point. Typical SCADA systems are comprised of a series of sensors sending signals to a central 
command center where the signals are interpreted. Signals are sent from the command center to actuators 
to throttle input conditions and provide the necessary environmental condition required for the mission 
operations. Typical components for a SCADA system are shown in table 5-4. 



Table 5-4 


Typical components for a SCADA system 


Computer access 


panel 






Digital drivers 








• Power Supplies 








PLC 








Interface devices such as 


control panels or 


flying circuit breakers. 



5-3. RCM data sources 

Conducting an RCM analysis requires an extensive amount of information. Since much of this 
information is not available early in the design phase, RCM analysis for a new product cannot be 
completed until just prior to production. Table 5-5 lists some general sources of data for the RCM 
analysis. The data elements from the Failure Modes and Effects Analysis (FMEA) that are applicable to 
RCM analysis are highlighted in paragraph 5-5b. Note that when RCM is being applied to a product 
already in use, or when a maintenance program is updated during Life Exploration, historical maintenance 
and failure data will be inputs for the analysis. 
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Table 5-5. General data sources for the RCM analysis 


Data Source 


Comment 


Lubrication requirements 


Determined by designer. For off-the-shelf items being integrated into 
the product, lubrication requirements and instructions may be available. 


Repair manuals 


For off-the-shelf items being integrated into the product. 


Engineering drawings 


For new and off-the-shelf items being integrated into the product. 


Repair parts lists 




Quality deficiency reports 


For off-the-shelf items being integrated into the product. 


Other technical documentation 


For new and off-the-shelf items being integrated into the product. 


Recorded observations 


From test of new items and field use of off-the-shelf items being 
integrated into the product. 


Hardware block diagrams 


For new and off-the-shelf items being integrated into the product. 


Bill of Materials 


For new and off-the-shelf items being integrated into the product. 


Functional block diagrams 


For new and off-the-shelf items being integrated into the product. 


Existing maintenance plans 


For off-the-shelf items being integrated into the product. Also may be 
useful if the new product is a small evolutionary improvement of a 
previous product. 


Maintenance technical 
orders/manuals 


For off-the-shelf items being integrated into the product. 


Discussions with maintenance 
personnel and field operators 


For off-the-shelf items being integrated into the product. Also may be 
useful if the new product is a small evolutionary improvement of a 
previous product. 


Results of FMEA, FTA, and 
other reliability analyses 


For new and off-the-shelf items being integrated into the product. 
Results may not be readily available for the latter. 


Results of Maintenance task 
analysis 


For new and off-the-shelf items being integrated into the product. 
Results may not be readily available for the latter. 



a. C4ISR data sources. RCM related data may be obtained from several different types of sources. 
Some potential sources of maintainability data include those shown in table 5-6. 

Table 5-6. Potential sources of C4ISR maintainability data 



Historical data from similar products used in similar conditions (PREP Database, IEEE 
Gold Book) 

Product design or manufacturing data 

Test data recoded during demonstration testing 
• Field data 



(1) Expressing data. The data maybe expressed in a variety of terms. These include observed 
values or modified values (true, predicted, estimated, extrapolated, etc.) of the various maintainability 
measures. Some precautions are therefore necessary regarding the understanding and use of such data as 
shown in table 5-7. 
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Table 5-7. Understanding and using different sources of data 



Historical - Used primarily during the concept definition phase to generate specifications 
requirements. In latter phases historical data may be compared with actual data obtained for the 
product. They can also serve as additional sources of information for maintainability verification. 
Product Design and Manufacturing - Data obtained through the use of design analysis or prediction, 
or from data generated during the design phase or the manufacturing phase. Design data may be 
used as the basis for product qualification and acceptance, review and assessment of historical data 
relevancy and the validity or previous assessments. Before this type of data is used in your analysis 
you must understand the data collection and analysis methodology, why the specific method was 
chosen, and any possible limitations. 

Product Demonstration and Field - These data are essential for sustaining engineering activities 
during the in-service phase of the system life cycle. They include maintainability related data 
obtained from formal or informal demonstration test on mock-ups, prototypes or production 
equipment in either a true or simulated environment or data generated during actual item use. 



(2) Other data categories. Other categories of data that would be beneficial to collect include 
information on the maintenance support conditions. Operational maintainability may not be determined 
solely by inherent maintainability, but by logistical factors. Therefore information to be collected should 
include shortages in spares (due to inadequate initial provisioning, long pipeline times, etc.), test 
resources, and human resources. Such data are important to determine why a system's maintainability as 
measured in the field, may not be meeting the values expected based on the design data. 

(3) SCADA systems. SCADA systems are excellent data collection mechanisms, providing the 
system is initially design to capture critical information. It can also be utilized to monitor trends of 
component operational conditions to provide information on proactive logistics supplies. 

5-4. PM tasks under RCM 

a. Lubrication and servicing task. Many mechanical items in which movement occurs require 
lubrication. Examples include internal combustion engines that require oil and periodic replacement of 
that oil (and associated filters). Lubrication and servicing tasks are sometimes overlooked due their 
relative simplicity and because they are "obvious." Prior to the latest version of the airline's RCM 
approach, lubrication and servicing tasks were often omitted from the decision logic tree, with the 
understanding that such tasks cannot be ignored. In the current MSG-3, these tasks are explicitly included 
in the decision logic, as they are in this document. 

b. Inspection or functional check task. Inspections normally refer to examinations of items to ensure 
that no damage, failure, or other anomalies exist. Inspections can be made of: an entire area (e.g., the 
body or "under the hood"), a subsystem (e.g., the engine, controls, or feed mechanism), and a specific 
item, installation, or assembly (e.g., the battery, shaft, or flywheel). 



(1) Visual inspections or checks. These are checks conducted to determine that an item is 
performing its intended function. The check may be performed by physically operating the item and 
observing parameters on displays or gauges, or by visually looking to see if the function is being 
performed properly. In neither case are quantitative tolerances required. A functional check consists of 
operating an item and comparing its operation with some pre-established standard. Functional checks 
often involve checking the output of an item (e.g., pressure, torque, voltage, or power) and checking to 
determine if the output is acceptable (i.e., within a pre-established range, greater than a pre-established 



5-4 



TM 5-698-2 



minimum value, or less than a pre-established maximum value). These checks are conducted as failure- 
finding tasks. 

(2) Use ofNDI. Inspections may consist of purely visual examinations or be made using special 
techniques or equipment. Many inspections require the special capability of non-destructive inspection 
(NDI) techniques. Table 5-8 lists some of the NDI methods available to maintenance personnel. 

c. Restoration task. Many items, primarily mechanical, wear out as they are used. At some point, it 
may be necessary, and possible, to restore the item to "like new" condition. Examples include internal 
combustion engines, electric motors, and pumps. 

d. Discard task. Some items upon failure or after their useful life has been reached (i.e., they are worn 
out), cannot be repaired or restored. These items must be discarded and replaced with a new item 
identical in function. Examples include seals, fan belts, gaskets, screws (stripped threads), and oil filters. 

5-5. The RCM process 

The objective of conducting an RCM analysis is to rank all included equipment and systems by their 
relative importance, and risk, to the overall facility mission, and to prescribe PM tasks based on 
subsystem and system ranking. The RCM process is outlined below, by an expanded figure 2.1, and 
following text. 
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FMECA 



Define the System - Identify and document the boundaries of the analysis 
Identify and document equipment included in the analysis 
Identify and document the indenture level the analysis is intended to extend to 

Define Ground Rules and Assumptions - Identify and document ground rules and assumptions 
used to conduct the analysis 

Construct Equipment Tree - Construct equipment block diagrams to indicate equipment 
configuration, down to the lowest indenture level intended to be covered by the analysis 

Identify Failure Modes - Identify the potential failure modes for the analyzed equipment at the 
indenture levels covered by the analysis 
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Analyze Failure Effects - Analyze the effects of the identified failure modes on the lowest levels 
of indenture and above 

Classify Effect Severity - Classify the effects of the identified failure modes on the lowest levels 
of indenture and above 

Identify Detection Method - Identify and classify the methods, in place, by which potential 
failures may be detected or avoided 

Perform Criticality Calculations - Perform Criticality Analysis 

Identify Critical Items - Identify items within the analysis that ranked highly critical 

Assign Maintenance Focus Levels - Classify maintenance focus levels based on criticality 
rankings 

Apply RCM Decision Logic - Apply RCM logic trees for items, especially those identified as 
being critical (see figure 5-2) 

Identify Maintenance Tasks - Identify maintenance tasks to be performed on the given item 

Package Maintenance Program - Develop a maintenance tasking schedule for the analyzed 
equipment 

Note: RCM Analysis is intended to be a living analysis. Effort should be made to continue to collect more 
complete information and add it to the analysis, to continue to provide a foundation for effective 
continuous improvement. Results and recommendations should be periodically reviewed and re- 
evaluated, taking into consideration additional information of any kind. 

a. Identify the system configuration. Since the RCM analysis usually begins before the final design has 
been completed, the system configuration is changing. Even when the design is complete, model changes 
can be made. The configuration, of course, determines how functions are performed, the relationship of 
items within a product, and so forth. Consequently it is important that the precise configuration of the 
product or system for which the RCM analysis is being conducted be documented as part of the analysis. 
It is also important that the analysis be updated to account for any changes in the configuration (some of 
which may be required as a direct result of the RCM analysis itself). 

b. Perform an FMEA and other analyses. To perform the RCM analysis, many pieces of information 
are needed. These include the information shown in table 5-9. Obviously, such information will 
probably not be known or be very shaky early in design. For that reason, the RCM analysis should not be 
started until sufficient and reasonably stable information is available. Of course, the objective is to 
develop and complete the initial maintenance program prior to the product being transferred to the 
customer. 
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RCM Analysis can be conducted using a traditional quantitative, qualitative, or flexible approach. 

• Traditional quantitative approach can be used when there is sufficient failure rate data available to 
calculate criticality numbers. A quantitative approach is the preferred analysis method. However, 
to be effective, high levels of failure specific data must be available. When specific failure rates 
for specific failure modes and failure mechanisms are unavailable, analysis must be conducted 
qualitatively. 

• Qualitative analysis must be used when specific part or item failure rates are not available. 
Therefore, failure mode ratio and failure mode probability are not used in this analysis. Instead, 
the equipment is ranked in terms of discrete occurrence levels. Under traditional qualitative 
analysis severity, occurrence, and detection method levels are determined subjectively and 
utilized to produce a component risk assessment. 

• The flexible technique is born of traditional qualitative analysis. Under this approach, RPN 
calculations will be generated by the same formulas as given by traditional qualitative approach. 
However, the arguments of the component level RPN calculation (O, S, D) will be defined 
differently. 

RPN=O.SxD 

Where: 

RPN = Risk associated with failure mode (Risk Priority Number) 

S = Severity level for failure mode (subjective) 

O = Occurrence level for failure mode (Reliability Data) 

D = Detection method level (Subjective) 

(1) Other inputs. When FTAs are needed to understand the effects of, for example, multiple 
failures, the information derived from these analyses can also be valuable inputs to the RCM analysis. 



5-8 



Table 5-8. NDI techniques 



^^^^ Main Application 
NDE Method ^^^^ 


C 


w 


F 


c 

R 


E 


L 


M 
A 


M 
C 


s 


D 


M 
T 


D 
T 


p 

R 


o 

T 
H 
E 
R 


Legend: C = Cracks; W = Wear; F = Fractures; CR = Corrosion; 
E = Erosion; L = Leaks; MA = Material Analysis; 
MC = Material Conditions; S = Stress; 
D = Deformation; MT = Material Thickness; 
DT = Deposit Thickness; PR = Physical Restrictions 




Remarks 


1 Acoustic cross correlation 












X 


















Locating buried pipes 


2 Acoustic emission 


X 




X 






X 




X 




X 








X 


Internal structural noise 


3 Coating thickness 
























X 




X 


Magnetic methods and eddy currents. Ferrite content of 
ferritic-austenitic steels 


4 Dye penetrant 


X 




X 






X 


















Including the chalk, water, alcohol methods 


5 Eddy current testing 


X 


X 


X 


X 


X 


X 








X 


X 






X 


Heat exchanger tubes, wire rope, surface checks, sorting 


6 Emission spectroscopy 
(Metascope) 














X 
















Low and high alloy steels. Including X-ray fluorescence 


7 Endoscopy 


X 


X 


X 


X 


X 


X 












X 


X 




Inspection of internal surface 


8 ER-probe 








X 






















Average corrosion rates 


9 Ferrography 




X 


























Lubricated mechanical systems 


10 Hardness testing 
















X 














Brinell, Vickers, Rockwell B, C&N, Rockwell superficial, 
Knoop, Shore, Scleroscope, Equotip, UCI 


11 Hydrogen cell 








X 






















Average corrosion rates 


12 Isotope techniques 




X 








X 




X 






X 


X 


X 


X 


Tracer tech., ball test, radiometry, collim. Photon 


13 Laser distance measurements 
(optocator) 




X 


















X 






X 


Topography, symmetry 


14 Leak testing resistance 












X 
















X 


Liquid penetrant, ultrasonics, pressure change, foam, tracers, 
sulphur diffusion, ozalide paper, halogen 


15 LPR-probe, polarization 








X 






















Instantaneous corrosion rate 


16 Magnetic plugs 




X 


























Lubricated mechanical systems 


17 Magnetic particle examination 


X 


























X 


Weld defects, laminations - only ferromagnetic materials 


18 Mechanical calibration 




X 




X 


X 












X 


X 




X 


Physical dimensions 


19 NDE method combination 


X 


X 


X 


X 


X 


X 


X 


X 


X 


X 


X 


X 


X 


X 


Check of entire component condition. Predictive programs 


20 NDE meth. under, dev. 


(X) 














(X) 


(X) 


(X) 








(X) 






20.1 SPAT 


















X 












Stress pattern analysis by thermal emission 




20.2 Pulsed video 

thermography (PVT) 
















X 












X 


Composite materials. Glued metals, delamination, and 
coatings. 



■ 

to 



to 

CO 



Table 5-8. NDI techniques (Cont'd) 



o 



^^^^ Main Application 
NDE Method ^^^^ 


C 


w 


F 


c 

R 


E 


L 


M 
A 


M 
C 


s 


D 


M 
T 


D 
T 


p 

R 


o 

T 
H 
E 
R 


Legend: C = Cracks; W = Wear; F = Fractures; CR = Corrosion; 
E = Erosion; L = Leaks; MA = Material Analysis; 
MC = Material Conditions; S = Stress; 
D = Deformation; MT = Material Thickness; 
DT = Deposit Thickness; PR = Physical Restrictions 




Remarks 




20.3 Moire contour 




















X 








X 


Topography 




20.4 Holographic 

interferometry (HI) 


















X 










X 


Lack of adhesion, material defects, thin samples 




20.5 Computerized 

tomography (CT) 


X 


























X 


Annual rings, knots, moisture, concrete column cross sections 




20.6 Positron annihilation 
















X 












X 


Voids in metals. Fatigue in titanium 


21 Noise measurements 




























X 


Noise level, bearing checks 


22 Pattern recognition 


X 


X 


X 


X 


X 










X 


X 


X 


X 






23 P-scan 


X 


X 


X 


X 


X 












X 






X 


Weld inspection, stress corrosion, corrosion topography, 
creep defects. Full documentation 


24 Pinhole 




























X 


Coatings, high/low voltage 


25 Pressure testing 


X 




X 






X 








X 










Including vacuum testing. See also leak 


26 Radiography 


X 


X 


X 


X 


X 


X 










X 


X 


X 


X 


Check of joints, geometry, laminations, reinforced concrete 
and corrosion/erosion 


27 Replica technique 


X 


X 


X 










X 




X 








X 


Surface microstructure, crack type, wear grooves, topography 


28 Spectrometric oil analysis 
program 




X 


























Lubricated mechanical systems 


29 Strain gauge technique 


















X 


X 










Weight, pressure, oscillation 


30 Stroboscopy 


X 


X 


X 






















X 


Visual condition monitoring, rotation direction and rate 


31 Test coupons 








X 


X 




















Average corrosion rate 


32 Thermography 


X 






X 




X 












X 




X 


Surface temp., bearing pressure, moisture, energy loss 


33 Ultrasonic lea, detection 












X 
















X 


Electrical discharge, flow 


34 Ultrasonics 


X 


X 


X 


X 


X 


X 




X 


X 


X 


X 








Including sound attenuation 


35 Vibration monitoring 


X 


X 


X 






















X 


Machinery include bearings, gears, turbines, centrifuges, etc. 


36 Visual inspection 


X 


X 


X 


X 


X 


X 


X 






X 




X 


X 




Spark pattern & chemical analysis 


37 X-ray crawlers 




























X 


Checking welds inside pipes 


38 X-ray diffraction 


















X 












Measurement residual stresses 



o 
to 

00 

I 
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Table 5-9. Information needed for RCM 



The types of failures that can occur in the product 

The failure characteristics of the items that make up the product being 

analyzed 

The nature of the failures (hidden, evident, safety, operational, etc.) 

The capabilities of the maintenance organization 

The maintenance concept 

A thorough understanding of operation 



(2) Other information. Other important sources of information for the RCM analysis include 
Reliability Block Diagrams (RBDs), Functional Block Diagrams, system requirements documents, 
descriptions of system applications, technical manuals/drawings/layouts, and indenture level 
identification system. 

(3) Sources. To provide the needed information, various sources must be exploited. One of the 
most obvious sources is the body of analyses conducted as part of the design process. These include the 
Failure Mode and Effects Analysis (FMEA) or Failure Modes, Effects, and Criticality Analysis 
(FMECA), Fault Tree Analysis (FTA), maintainability analysis, and so forth. 

(4) FMEA. The FMEA can be a primary source of much of the information needed for the RCM 
analysis. Figure 5-1 shows excerpts of the form prescribed in the Automotive Industry Group standard on 
FMEA/FMECA. Upon examining figure 5-1, it is obvious that the data in many of the columns can be 
directly used for the RCM analysis. The columns having data most applicable for the RCM analysis are 
shaded. In addition to those shown, columns can be added for functions, functional failure, compensating 
provisions, and three columns for failure effects: local effects, next higher level, and end effects. Other 
chart examples for recording FMECA data can be used as shown in figure 5- la. Further information is 
available in TM 5-698-4, Failure Modes, Effects and Criticality Analysis (FMECA) for C4ISR Facilities. 
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Form from the Automotive Industry Group Standard on FMEA 





Potenti 

al 
Failure 
Mode(s 

) 


Potential 
Effect(s) 
of Failure 


S 

E 
V 


C 

L 
A 
S 
S 


Potential 

Cause(s)/ 

Mechanis 

ms of 

Failure 


O 
C 
C 


Current 
Design 
Controls 


D 
E 

T 


R 
P 

N 


Recommen 
ded 

Action(s) 


Responsibili 

ty & Target 

Completion 

Date 


Action Results 


tern/ 
nction 


Action 
Taken 


New 
Sev 


New 
Occ 


New 
Det 


RPI 





































Figure 5-1. Data elements from FMEA that are applicable to RCM analysis. 

Legend: SEV - Severity of failure effect 

OCC - Probability of occurrence 

DET - Method of detection 

RPN - Risk Priority Number 
A completed chart may be similar to the following example: 



ITEM 
NUMBER 


ITEM/FUNC- 
TIONAL ID 


POTENTIAL 

FALIRURE 

MODES 


FAILURE 

MECHANISM 

(CAUSE) 


W 
< 

s 


FAILURE 

RATE/l p 

(SOURCE) 


DETECTION 
METHOD 


CRITICALITY 
NUMBER ( c ) 


130.2 


Cooling 
Tower #1/ 
maintain a 
water temp of 

75°F. 


Fan failure 


Motor winding 
open, Loss of 
power to motor 


3 


10.0518x10" 

6 


3 


99.05X10" 5 


310.1 


Air Handler/ 
Provide 
3200cfm of air 
to room, 
maintain room 
at 72°F, 


Provide 
airflow at a 
rate less than 
3200cfm 


reduced motor 
output - 
winding 
degradation, 
belt slippage- 
belt too loose, 
loose sheave, 
Dirty intake 
filter 


3 


1.7657xl0" 6 


2 


1.06xl0" 5 


310.0 


Air Handler/ 
Provide 
3200cfm of air 
to room, 
maintain room 
at 72°F, 


Maintain air at 
a temp higher 
than 72°F 


Dirty coils 


3 


1.7657xl0" 6 


7 


3.7xl0" 5 



DA FORM 7610 AUG 2006 



Figure 5-la. Example of Failure Modes and Effects Analysis worksheet; DA Form 7610. 
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Where: 

• Failure modes are the generic manner in which an item failed 

• Failure mechanisms are the specific circumstances that allowed the given failure mode to occur 

• Severity is the assessment of the consequence of a given failure 

• Occurrence is the probability of the failure occurring (failure rate) 

c. Apply RCM decision logic. The overall decision logic for applying the RCM methodology is 
depicted in figure 5-2. The decision logic represented in this figure is adapted from that used in the 
Reliability Analysis Center's Master Steering Group -3 (MSG-3). The most significant difference is in 
the portions of the tree labeled ©, ©, @, and ®. MSG-1 through MSG-3 (see paragraph 1-6) used the 
term "safety" for these portions of the tree. 

(1) Safety. Obviously, safety is of paramount importance to the airline industry, as it is in other 
industries, such as the nuclear power industry. 

(2) Other Critical Considerations. Many industries have concerns that are as important, or nearly 
so, as safety considerations. The petroleum and chemical industries, for example, are subject to severe 
economic and even criminal penalties under Federal statutes for events in which the environment is 
polluted. For other industries, failures that result in the violation of other Federal, state, or local statutes, 
or in other unacceptable consequences may be treated as seriously as safety-related failures are in the 
airline industry. For that reason, in the portions of the tree labeled ©, ©, ©, and ®, the term "hazardous 
effects" is used rather than "safety effects". (The circled numbers in this and following discussions refer 
to a corresponding numbered portion of the referenced figures.) When applying RCM decision logic, it is 
important to consider the criticality of the current item. Highly critical items have the direct potential to 
compromise mission goals, and risk should be heavily mitigated. It is important to recognize single point 
failures, as well as their functional contribution to critical and non-critical systems, and to prescribe 
maintenance approaches accordingly. Conversely, some items recognized as being very non-critical may 
be allowed to run to failure, especially non-critical items that are inherently very reliable. This viewpoint 
should also be incorporated into the use of RCM decision logic to build an intelligent, and cost effective, 
maintenance strategy. 

d. Use of Logic Tree. As can be seen from figure 5-2, the decision logic tree consists of a series of 
Yes-No questions. The answers to these questions lead to a specific path through the tree. The questions 
are structured to meet the objectives of the RCM analysis: ensure the safe (non-hazardous) and 
economical operation and support of a product while maximizing the availability of that product. This 
objective is met by selecting preventive maintenance (PM) tasks when appropriate, redesign, some 
combination of PM and redesign, and by corrective maintenance (CM) when PM is either applicable or 
effective. 

(1) The first question asked is "Is the occurrence of a functional failure evident to the operator (or 
user) during normal use?" A "No" answer means that the failure is hidden, and the analyst is directed to 
© in the tree. The portion of the tree below © is discussed under paragraphs 5-5h and 5-5L A "Yes" 
answer means that the failure can be observed or is made known to the operator/user, in which case, the 
analyst is directed to ©. 

(2) At © , the question is "Does the (evident) functional failure or secondary damage resulting from 
the functional failure have a direct and hazardous effect?" A "Yes" answer directs the analyst to ©. The 
portion of the tree below © is discussed under paragraph 5-5e. A "No" answer directs the analyst to ®. 
The portion of the tree below © is discussed under paragraphs 5-5f and 5-5g. 
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ISTHE OCCURRENCE OF A FUNCTIONAL 

FAILURE EVIDENTTO THE OPERATOR 

DURING NORMAL USE? 



Q 



EVIDENT FUNCTIONAL FAILURE 



YES 



HIDDEN FUNCTIONAL FAILURE 



DOES THE FUNCTIONAL FAILURE OR 

SECONDARY DAMAGE RESULTING FROM 

THIS FAILURE HAVE A DIRECTAND 

HAZARDOUS 1 ' EFFECT? 







Yes 



HAZARDE 

TASK(S) REQUIRED TO AVOID HAZARD&OS EFFECTS 







© 


ISA LUBRICATION ORSERVICING 
TASK APPUC ABLE & EFFECTIVE? 










LUBRICATION ORSERVICING 
TASK 


YES 






NO 


^\ 




' 


' 


(4B) 


IS AN INSPECTION OR FUNCTIONAL 
CHECKTO DETECT DEGRADATION OF 
FUNCTION APPUC ABLE & EFFECTIVE? 












INSPECTION OR FUNCTIONAL 
CHECK 


YES 






NO 


^~N 




' 


' 


© 


ISA RESTORATION TASK TO REDUCE 

FAILURE RATE APPUC ABLE & 

EFFECTIVE? 












RESTORATION TASK 


YES 






NO 


/0> 




l 


' 


(V) 


ISA DISCARD TASK TO AVOID 

FAILURES OR REDUCE FAILURE RATE 

APPUC ABLE & EFFECTIVE? 












DISCARD TASK 


YES 






NO 






' 


' 


© 


ISTHEREATASKORCOMBINATIONOF 

TASKS THATIS APPUC ABLE & 

EFFECTIVE? 




TASKC 
TASKS M 


)RCOMBI NATION OF 
OSTEFFECTIVE MUSTBE 
DONE 


YES 

1 


NO 



REDESIGN IS 
MANDATORY 



No 



DOES THE FUNCTIONAL FAILURE HAVE A 

DIRECTAND ADVERSE EFFECTON 

OPERATING CAPABILITY? 



Yes 



OPERATIONAL EFFECTS: 
TASK(S) DESIRABLE IF RISK IS REDUC ED~ 
ACCEPTABLE LEVEL 



© 


ISA LUBRICATION ORSERVICING 
TASK APPUC ABLE & EFFECTIVE? 










LUBRICATION ORSERVICING 
TASK 


YES 






NO 


© 




' ' 


IS AN INSPECTION ORFUNCTIONAL 
CHECKTO DETECTDEGRADATION OF 
FUNCTION APPUC ABLE & EFFECTIVE? 












INSPECTION OR FUNCTIONAL 
CHECK 


YES 






NO 


/~\ 




1 


' 


© 


IS A RESTORATION TASK TO REDUCE 

FAILURE RATE APPUC ABLE & 

EFFECTIVE? 












RESTORATION TASK 


YES 






NO 


/O 




' 


' 


(5D) 


ISA DISCARD TASK TO AVOID 

FAILURES O R REDUC E FAILURE RATE 

APPUC ABLE & EFFECTIVE? 












DISCARD TASK 


YES 


NO 










1 


f 



REDESIGN MAY 
BE DESIRABLE 








No 







ECONOMIC EFFECTS: v 
TASKCS) DESIRABLE IF COSTIS LESS T1HMTREPAIR 
COSTS 



© 


ISA LUBRICATION ORSERVICING 
TASK APPUC ABLE & EFFECTIVE? 










LUBRICATION ORSERVICING 
TASK 


YES 






NO 


© 




' ' 


IS AN INSPECTION ORFUNCTIONAL 
CHECKTO DETECTDEGRADATION OF 
FUNCTION APPUC ABLE & EFFECTIVE? 












INSPECTION ORFUNCTIONAL 
CHECK 


YES 






NO 


S~~\ 




' ' 


S 


ISA RESTORATION TASK TO REDUCE 

FAILURE RATE APPUC ABLE & 

EFFECTIVE 












RESTORATION TASK 


YES 






NO 


C\ 




' r 


(W) 


ISA DISCARD TASK TO AVOID 

FAILURES OR REDUC E FAILURE RATE 

APPUC ABLE & EFFECTIVE? 












DISCARD TASK 


YES 


NO 










l 


f 



REDESIGN MAY 
BE DESIRABLE 



♦Hazardous effects include property damage, injury or death to operators or other people, violation of Federal environmental or health 
statutes, and other effects determined by the company or industry to be serious or catastrophic. 



Figure 5-2. RCM decision logic tree (adapted from MSG-3). 
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HIDDEN FUNCTIONAL FAILURE 



n 



DOESTHECOMBINATION OFA HIDDEN 

FUNCTIONAL FAILURE AND ONE ADDITIONAL 

FAILURE OF A SYSIEM-RELATED OR BACKUP 

FUNCTION HAVE A HAZARDOUS 11 EFFECT? 



HAZARDOUS EFFECTS 
TASK(S) REQUIRED TO ENSURE NON- 
HAZARDOUS OPERATION 









© 


ISA LUBRICATION ORSERVICING 
TASK APPUC ABLE & EFFECTIVE? 














LUBRICATION ORSERVICING 
TASK 


YES 


NO 









' ' 




s — ' 


ISA CHECK TO VERIFY OPERATION 
APPLICABLE & EFFECTIVE? 
















OPERATIONAL/ VISUAL 
CHECK 


YES 


NO 




Oi 




' 


' 




(8C) 


IS AN INSPECTION ORFUNCTIONAL 
CHECKTO DETECTDEGRADATION OF 
FUNCTION APPLICABLE & EFFECTIVE? 














INSPECTION OR FUNCTIONAL 
CHECK 


YES 


NO 




® 




1 


' 




IS A RESTORATION TASK TO REDUCE 

FAILURE RATE APPLICABLE & 

EFFECTIVE? 
















RESTORATION TASK 


YES 


NO 




/~\ 




l 


' 




(W) 


ISA DISCARD TASK TO AVOID 

FAILURES OR REDUCE FAILURE RATE 

APPUC ABLE & EFFECTIVE? 
















DISCARD TASK 


YES 






/~\ 




' 


' 




© 


ISTHEREATASKORCOMBINATIONOF 

TASKS THATIS APPUC ABLE & 

EFFECTIVE? 






TASKC 


JRCOMBI NATION OF 
O ST EFFECTIVE MUSTBE 
DONE 

REDESIGN 


YES 


NO 
IORY 




TASKS M 


SMANDA1 













NON- HAZARDOUS EFFECTS: 

TASK(S) DESIRABLE TO ENSURE AVAILABIUTY IS SUC H 

THATECONOMIC EFFECTS OF MULTIPLE FAILURES ARE 

AVOIDED 









© 


ISA LUBRICATION ORSERVICING 
TASK APPUC ABLE & EFFECTIVE? 














LUBRICATION ORSERVICING 
TASK 


YES 


NO 




S 




l ' 




v ' 


ISA CHECKTO VERIFY OPERATION 
APPUC ABLE & EFFECTIVE? 
















OPERATIONAL/ VISUAL 
CHECK 


YES 


NO 




© 




l ' 




IS AN INSPECTION ORFUNCTIONAL 
CHECKTO DETECTDEGRADATION OF 
FUNCTION APPUC ABLE & EFFECTIVE? 














INSPECTION ORFUNCTIONAL 
CHECK 


YES 


NO 




^~\ 




' ' 




(W) 


ISA RESTORATION TASKTO REDUCE 

FAILURE RATE APPUC ABLE & 

EFFECTIVE? 
















RESTORATION TASK 


YES 


NO 




(Ol 




' 


' 




(9e) 


ISA DISCARD TASKTO AVOID 

FAILURES OR REDUC E FAILURE RATE 

APPUC ABLE & EFFECTIVE? 
















DISCARD TASK 


YES 


NO 




REDESIG 


NISDESRi 


f 
\BLE 





""Hazardous effects include property damage, injury or death to operators or other people, 
violation of Federal environmental or health statutes, and other effects determined by the 
company or industry to be serious or catastrophic. 



Figure 5-2. RCM decision logic tree (adapted from MSG-3) (Cont'd). 
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e. Evident Failure - Hazardous Effects. The portion of the decision logic tree that deals with 
situations where an evident functional failure has hazardous effects is shown in figure 5-3. 

(1) This portion of the tree steps the analyst through a series of questions intended to identify any 
and all PM tasks that will reduce to an acceptable level the probability of occurrence of the functional 
failure that results in the effects, reduce the effects to purely operational or economic effects, or result in a 
combination of these two improvements. 

(2) If none of the PM tasks listed is either applicable or effective, then redesign is mandatory . The 
reason for making redesign mandatory is obvious. The effects categorized as "hazardous" are 
unacceptable. Consequently, when PM cannot fulfill any of the objectives listed, we must redesign the 
product to eliminate the mode of failure that causes the hazardous effects, reduce to an acceptable level 
the probability of occurrence of the functional failure that results in the effects, or result in a combination 
of these two improvements. 



5-16 



TM 5-698-2 



HAZARDOUS EFFECTS: 
TASK(S) REQUIRED TO ENSURE NON-HAZARDOUS OPERATION 



, 















(£) 


IS A LUBRICATION OR SERVICING 
TASK APPLICABLE & EFFECTIVE? 






LUBRICATION OR 
SERVICING TASK 


YES 








NO 










i r 






(s) 


IS AN INSPECTION OR FUNCTIONAL CHECK 

TO DETECT DEGRADATION OF FUNCTION 

APPLICABLE & EFFECTIVE? 






















INSPECTION OR 
FUNCTIONAL CHECK 


YES 


NO 












i r 






© 


IS A RESTORATION TASK TO REDUCE 
FAILURE RATE APPLICABLE & EFFECTIVE? 






















RESTORATION TASK 


YES 


NO 








<«) 




i r 






IS A DISCARD TASK TO AVOID 

FAILURES OR REDUCE FAILURE RATE 

APPLICABLE & EFFECTIVE? 






















DISCARD TASK 


YES 


NO 








© 




1 


' 






IS THERE A TASK OR COMBINATION OF 
TASKS THAT IS APPLICABLE & EFFECTIVE? 








TASK OR COMBINE 


kTION OI 
ECTIVE 

NE 


YES 


NO 

[GNIS 
.TORY 






TASKS ] 
Ml 


MOST EFF 
JST BE DO 


REDES 
MANDA 





Figure 5-3. Evident failure - hazardous effects. 
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f. Evident Failure - Operational Effects. The portion of the decision logic tree that deals with 
situations where an evident functional failure has a direct and adverse effect on operating capability is 
shown in figure 5-4. This portion of the tree steps the analyst through a series of questions intended to 
identify any and all PM tasks that will reduce the risk of failure to an acceptable level. If none of the PM 
tasks listed is either applicable or effective, then redesign may be desirable . The cost of a functional 
failure that results in operational effects includes both the cost of the PM and the economic cost incurred 
as a result of the end system not completing a mission or being able to perform its function(s). 

(1) If the costs exceed the cost to redesign the product, redesign is economically justified. The 
purpose of the redesign would be to eliminate the mode of failure that causes the operational effects, 
reduce to an acceptable level the probability of occurrence of the functional failure that results in the 
effects, or some combination of these. 

(2) Even if redesign is economically justified, other considerations, such as schedule, may outweigh 
the advantages gained. 
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OPERATIONAL EFFECTS: (jO 
TASK(S) DESIRABLE IF RISK IS REDUCED TO AN ACCEPTABLE LEVEL 













© 


IS A LUBRICATION OR SERVICING 
TASK APPLICABLE & EFFECTIVE? 


















LUBRICATION OR 
SERVICING TASK 


YES 


NO 








(SB) 




i r 






IS AN INSPECTION OR FUNCTIONAL CHECK 

TO DETECT DEGRADATION OF FUNCTION 

APPLICABLE & EFFECTIVE? 






















INSPECTION OR 
FUNCTIONAL CHECK 


YES 


NO 












i r 






<S> 


IS A RESTORATION TASK TO REDUCE 
FAILURE RATE APPLICABLE & EFFECTIVE? 






















RESTORATION TASK 


YES 


NO 












' 


' 






© 


IS A DISCARD TASK TO AVOID 

FAILURES OR REDUCE FAILURE RATE 

APPLICABLE & EFFECTIVE? 






















DISCARD TASK 


YES 


NO 








REDESIGr 
DESIR 


sfMAY 
ABLE 


BE 



Figure 5-4. Evident failure - operational effects. 
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g. Evident Failure - Economic Effects. The portion of the decision logic tree that deals with situations 
where an evident functional failure has only an economic effect is shown in figure 5-5. This portion of 
the tree steps the analyst through a series of questions intended to identify any and all PM tasks that are 
desirable if their costs are less than the cost of repair. If none of the PM tasks listed is either applicable or 
effective, then redesign may be desirable . Again, the decision to redesign or not redesign is one of 
economics. If redesign is less than the economic effects of the failure, then it may be desirable. 
Otherwise, redesign is not justified. 
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0' 




ECONOMIC EFFECTS: (6 
TASK(S) DESIRABLE IF COST IS LESS THAN REPAIR COSTS 















© 


IS A LUBRICATION OR SERVICING TASK 
APPLICABLE & EFFECTIVE? 






LUBRICATION OR 
SERVICING TASK 


YES 








NO 






(s) 




i r 






IS AN INSPECTION OR FUNCTIONAL CHECK TO 

DETECT DEGRADATION OF FUNCTION 

APPLICABLE & EFFECTIVE? 






















INSPECTION OR 
FUNCTIONAL CHECK 


YES 


NO 












i r 






<s> 


IS A RESTORATION TASK TO REDUCE FAILURE 
RATE APPLICABLE & EFFECTIVE? 






















RESTORATION TASK 


YES 


NO 












i 


' 






(so) 


IS A DISCARD TASK TO AVOID FAILURES 

OR REDUCE FAILURE RATE APPLICABLE 

& EFFECTIVE? 






















DISCARD TASK 


YES 


NO 








REDESIGP 
DESIR 


sfMAY 
ABLE 


BE 



Figure 5-5. Evident failure - economic effects. 
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h. Hidden Failure - Hazardous Effects. The portion of the decision logic tree that deals with 
situations where a hidden functional failure has a hazardous effect in combination with another failure is 
shown in figure 5-6. This portion of the tree steps the analyst through a series of questions intended to 
identify any and all PM tasks that are required to ensure non-hazardous operation. The tasks are effective 
if they reduce to an acceptable level the probability of occurrence of the functional failure that results in 
the effects, reduce the effects to purely operational or economic effects, or result in a combination of 
these. 

(1) If none of the PM tasks listed is either applicable or effective, then redesign is mandatory . The 
reason for making redesign mandatory is obvious. The effects categorized as "hazardous" are 
unacceptable. Consequently, when PM cannot fulfill any of the objectives listed, we must redesign the 
product to eliminate the mode of failure that causes the hazardous effects, reduce to an acceptable level 
the probability of occurrence of the functional failure that results in the effects, or result in a combination 
of these. 

(2) Note that by redesigning to make the failure evident, the effects might be reduced to purely 
economic or operational. 
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Figure 5-6. Hidden failure - hazardous effects. 
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i. Hidden Failure - Non-hazardous Effects. The portion of the decision logic tree that deals with 
situations where a hidden functional failure has a non-hazardous effect is shown in figure 5-7. This 
portion of the tree steps the analyst through a series of questions intended to identify any and all PM tasks 
that are desirable to ensure availability is sufficiently high to avoid the economic effects of multiple 
failures. If none of the PM tasks listed is either applicable or effective, then redesign is desirable . 
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Figure 5-7. Hidden failure - non-hazardous effects. 
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j. Package final maintenance program. The result of the RCM analysis will be a set of preventive 
maintenance (PM) tasks and, by default, a set of corrective maintenance (CM) tasks. PM will consist of 
on-condition and scheduled maintenance. 

(1) Frequency of tasks. The frequency with which each of the scheduled PM tasks must be 
performed will no doubt vary from item to item. It is also probable that many of these tasks may be 
grouped and performed together at some calendar or operating time interval. The process of grouping the 
scheduled tasks into sets of tasks to be performed at some prescribed time is called "packaging" the 
maintenance program. 

(2) Example of packaging. For example, it may be that for a given product that the scheduled tasks 
shown in table 5-10 were identified. One way to package these tasks is shown in table 5-11. Note that at 
the 100, 200, 300, etc. hour points, all of the tasks except the overhaul task are performed. This example 
is purposely over-simplified and many other factors may (and probably will) have to be considered when 
packaging the tasks. The point is that by packaging PM tasks, we use our maintenance resources as 
effectively as possible and minimize the downtime of the product for PM. 

Table 5-10. Example of identified tasks 



Three visual inspections: A to be conducted every 45 hours of operation, B to be 

conducted every 52 hours of operation, and C to be conducted every 105 hours of 

operation 

A lubrication performed every 55 hours of operation 

A non-destructive inspection every 100 hours of operation 

An overhaul task performed when a stated operating characteristic is out of limits 

A hard-time replacement task every 60 hours of operation 



Table 5-11. Packaging the tasks from table 5-4 



• 



Conduct the following PM every 50 operating hours (i.e., at 50, 100, 150, 200, 
etc.) 

- Visual inspections A and B 

- Lubrication 

- Hard-time replacement 

Conduct the following PM every 100 operating hours (i.e., at 100, 200, 300, etc.) 

- Visual inspection C 

Perform overhaul task whenever the operating characteristic goes out of limits 



k. Continuously improve the maintenance program. Given the possibility for errors in the initial 
maintenance program, it is prudent to implement the RCM process as an on-going effort, one requiring 
perpetual evaluation and adjustment, as depicted in figure 2-1. The process for continuously improving 
the RCM-based maintenance program consists of Maintenance Audit, Trend Analysis, and Life 
Exploration. The purpose of this process is to continuously improve the initial maintenance program 
developed using the RCM concept. 

(1) The initial maintenance program. The maintenance program that is developed based on the 
RCM analysis done prior to the first product being delivered to the customer is the initial maintenance 
program. This initial program will have been based on the best information that was available at the time 
the analysis was performed. One of the critical pieces of information is the underlying failure distribution 
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for each item. The information used in the initial RCM analysis was based on a mix of analysis and test 
results. When "off-the-shelf" items are used in the product, the information can include actual field 
experience. It must be recognized, however, that some of the information will not be 100% "accurate." 

(2) Maintenance audit. Auditing the maintenance performed in actual service provides the data 
needed to refine and improve the maintenance program. In analyzing the data, the maintenance analysts 
and planners attempt to address the technical content of the program, intervals for performing tasks, 
packaging of tasks, training, the maintenance concept, and the support infrastructure. 

(a) In addressing technical content, analysts and planners must determine if the current maintenance tasks 
cover all identified failure modes and result in the desired/required level of reliability. Failure modes may have been 
missed or the current maintenance tasks may not be effectively addressing identified failure modes. The latter may 
result from incorrectly identifying the underlying failure probability distribution function. Much of this information 
can be confirmed or updated through a reliability assessment. Table 5-12 lists the type of questions that can be 
answered by such an assessment. 

Table 5-12. Typical questions addressed by a reliability assessment 

• Were assessments of useful life too conservative? 

• Have replacement intervals been made too short? 

• Is wearout occurring later or earlier than anticipated? 

• Have the operating conditions or concept changed? 

• Has the reliability performance been as expected? 

• Have any new failure modes been uncovered? 

• Are failure modes identified in development occurring with the expected frequency and pattern (i.e., 
underlying pdf of failures)? 

• Have any modifications to the product been made or are any planned that would add or delete failure 
modes, change the effects of a given failure mode, or require additional or different PM tasks? 

• Were the consequences of failures forecast during development adequately identified? 

(b) In addressing performance interval, analysts and planners must determine if the intervals for 
PM tasks result in decreased resistance to failure. Most often, the objective is to extend the interval as 
much as possible, without compromising safety, when doing so will reduce costs. Initial intervals are 
frequently set at conservative levels. 

(c) In addressing task packaging, analysts and planners must determine if like tasks with similar 
periodicity are or can be grouped together to minimize downtime and maximize effectiveness. Lessons 
learned during actual operation and maintenance may make it necessary to revise the initial packaging. 

(d) The analysts and planners should evaluate if available personnel, as currently being trained 
and using available tools and data, are effectively performing the identified PM tasks. If not, changes to 
training, procedures, tools, and so forth should be considered. 

(e) The analysts and planners should determine if the maintenance concept for the product is 
effective or should be revised. 

(f) The analysts and planners should address the adequacy and responsiveness of the support 
infrastructure. If the performance of the infrastructure is not as anticipated, recommendations regarding 
policy, spares levels, and other factors should be considered. 
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(3) Trend analysis. By collecting data on failures, time to failure, effectiveness of maintenance 
tasks, and costs of maintenance, trends can be identified. The objective of trend analysis is to anticipate 
problems and adjust the maintenance program to prevent their occurrence. For the RCM effort, two 
factors typically addressed by trend analysis are the rate of occurrence of failures and maintenance costs. 

(a) For trending purposes, at least three data points are needed. The first two establish the trend 
(positive or negative) and the third serves as confirmation. (In control charting used for quality control, a 
trend is said to exist when 7 consecutive points continue to rise or fall). However, when measurements 
are based upon sample surveys over time, data at different points in time may vary because the underlying 
phenomenon has changed (i.e., a trend exists) or due to sampling error (i.e., the underlying phenomenon 
has not changed at all). It is not an easy task to sought out the one from the other. 

(b) Statistical methods can be used to determine if a trend actually exists. For example, if a 
system failure rate is actually changing (i.e., it is not constant), the Laplace Statistic will show that a trend 
exists at a certain level of confidence. 

(c) In addition to trend analysis, impending failures can be detected using pattern recognition, 
data comparison, tests against limits and ranges, correlation, and statistical process analysis. 

(4) Life exploration. The process of collecting and analyzing in-service or operational reliability 
data to update the maintenance program is called Life (or Age) Exploration. The data that should be 
collected during Life Exploration includes historical field service data. Historical field service data 
typically describes three kinds of maintenance activities: corrective maintenance actions, preventive 
maintenance action, and service maintenance action. 

(a) Historical corrective maintenance data. Corrective maintenance actions occur in response to 
an operational failure of the system. Corrective maintenance actions are always unscheduled, unwanted, 
inconvenient, and random. 

(b) Historical preventive maintenance data. Preventive maintenance actions occur in accordance 
with a schedule and are intended to minimize the need for corrective maintenance actions. 

(c) Historical service maintenance data. Service maintenance actions are those tasks performed to 
replenish expended parts and supplies required to operate a system. Many assets require adjustment, 
replenishment of supplies, lubrication, and cleaning. 

5-6. Specific considerations for implementing RCM for C4ISR facilities 

a. Current versus new facilities. Many C4ISR facilities were built and the mechanical and electrical 
equipment developed and installed without an RCM analysis having been conducted. Implementing 
RCM for an existing C4ISR facility, when the current PM program was not based on RCM, is different 
from implementing it on a facility, new or old, for which the PM program was based on RCM. 

(1) Current PM program in place. Of course, a program of preventive maintenance will already be 
in place for an existing facility. Without an RCM analysis, the PM program was probably based on past 
programs. Indications that the PM program is inefficient or ineffective are an excessive number of 
corrective maintenance actions (with an associated low facility availability), or an extremely large number 
of required PM actions that are imposing a very heavy economical penalty. Attempts to change the 
existing PM program may meet with some resistance (see paragraph 5-6c(3)). 
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(2) Need for supporting analyses. If an RCM analysis was not originally performed for the facility, 
its systems and equipment, much of the supporting analysis may also have been omitted. If such 
analyses, such as an FMEA, were not conducted, they must be conducted before an RCM-based PM 
program can be developed. For many of the installed systems and equipment, performing an FMEA or 
other analysis may be quite difficult because much of the data may not be available. Either the data was 
not acquired with the systems and equipment (i.e., data rights were not procured), or the data is missing. 
In such cases, engineers will have to use engineering judgment and require more time to adequately 
analyze the systems and equipment. 

(3) Feasibility of redesign. If following the RCM logic, it is possible that the path may lead to a 
"Redesign is mandatory" or "redesign may be desirable" outcome. Redesign during initial development is 
in itself a sometimes-difficult task. Once a system or piece of equipment is in operation, redesign is even 
more difficult. However, an advantage of a facility is that adding redundancy is less constrained, in terms 
of space and weight, than for other systems. 

b. Training. The RCM process is very disciplined and logical. It involves the integration of many 
different analytical tools, data, experience, and a decision logic tree. Without proper training, those 
assigned the responsibility of implementing RCM will find it difficult to succeed. Training in the RCM 
methodology and the related disciplines must be an essential element of an organization's plan for 
implementing RCM. For C4ISR facilities, especially when maintenance is outsourced (see chapter 6), 
funding must be provided for training to ensure that an RCM analysis is properly performed. Of course, 
training to ensure maintenance is properly performed is also essential. 

c. Pitfalls. In implementing an RCM program in organizations where the concept is new, pitfalls can 
make implementation ineffective. 

(1) Run to failure shock. For many maintenance managers and technicians, allowing an item to run 
to failure runs counter to "conventional wisdom". It is important that they understand the concepts of 
reliability and turn their focus from preventing failures to preserving function. 

(2) Failure to accept the "Preserve Function" principle. Most maintenance personnel traditionally 
have viewed their role as one of preventing failures. To effectively implement an RCM program, it is 
essential that maintenance personnel focus on preserving the function or functions of an item, not 
preventing failures. 

(3) Challenging the Past. Tradition and conventional wisdom remain the principal guidance for 
many maintenance organizations. Challenging past practices almost always invokes strong resistance, 
especially if the new practices are not fully understood. Education is the best way to deal with cultural 
resistance. 

(4) Organization structure. The RCM process requires close coordination and cooperation among 
several groups of people, including but not limited to designers, maintainers, and logistic planners. 
Organizational structures can impede or even prevent the level of cooperation and coordination needed to 
make RCM a success. The concept of integrated process/product teams (IPPTs) is one that facilitates and 
encourages cross-discipline cooperation. 

(5) Threat of reduction in staff. When RCM was first implemented within the airline industry, 
drastic reductions in scheduled maintenance tasks were made possible. Consequently, the number labor 
hours and people required to, for example, conduct structural inspections of an aircraft were significantly 
reduced. When a segment of an organization perceives that a new policy or procedure will eliminate their 
jobs, the natural reaction is to fight against the new policy or procedure. However, with vision and 
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planning, management can find ways to effectively use the resources freed up by implementing RCM and 
minimize the impact on jobs by using normal attrition, cross training, etc. 

(6) Inadequate buy-in. All too often, management implements a new policy or procedure without 
fully supporting that policy or procedure. If either resources or management interest is insufficient, the 
new policy or procedure will probably fall short of expectations. This is especially true for RCM, an 
approach that is often met with skepticism and resistance by the very same people who must help 
implement it. 

(7) Informal procedures. RCM is a very structured, disciplined method of developing a 
comprehensive and effective maintenance program. It cannot be effectively implemented on an informal 
or ad hoc basis. The procedures for implementing an RCM approach within an organization must be 
formal, documented, and managed. 

(8) Inadequate data collection. If the underlying pattern of failures for a given item is unknown, 
one cannot objectively determine if PM should be considered. Without adequate information regarding 
the frequency of failure or the parameters of the failure probability density function, one cannot 
objectively determine when a PM task should be performed. Data that is adequate in both quantity and 
type (e.g., time to failure) is essential to the RCM process. 

5-7. Evaluation of alternatives 

As a result of performing an RCM analysis, alternatives will present themselves. These alternatives fall 
into two categories: Maintenance Tasks and Designs. Both categories are a natural result of the RCM 
analysis. In examining the logic trees in paragraph 5-5, it is obvious that more than one type of 
maintenance task may be applicable and effective for a given failure. Also, in some cases, for example 
where the effects of a failure are hazardous or a hidden failure can occur, redesign is mandatory or 
desirable. How do we determine which tasks to perform? How do we select the "best" design change 
(e.g., in the case of failures with hazardous effects) or determine if a design change is cost-effective (e.g., 
in the case of a hidden failure). We can address these questions using Trade-off Studies, Operational 
Analysis, and Cost-Benefit Analysis. 

a. Trade-off studies. Designing a new system or a change to an exiting one, even a moderately 
complex one, requires a series of compromises. These compromises are inevitable, given the fact that 
requirements often conflict. Design decisions necessary to meet one requirement may result in another 
requirement not being met. For example, strength and fatigue life requirements drive the selection of 
materials and the size (bulk) of structures in one direction. The maximum weight requirement drives 
these same factors in the opposite direction. Systems engineering is the process of selecting design 
solutions that balance the requirements and provide an optimized system. Usually, this balance means 
that some requirements may not be fully met. The process of selecting one design solution over another 
is often referred to as design trade-offs. Trade-off studies consist of the steps shown in table 5-13. 

Table 5-13. Steps in design trades 



• Compare two or more design solutions 

Determine which provides the best results given cost and schedule constraints 
Determine if the system requirements can be met with the selected design solution 
If the system requirements cannot be met, determine the budget and schedule required to 
support a design solution that does allow the system requirements to be met, or re-evaluate the 
requirements 
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(1) RCM and desired design changes. An RCM analysis may indicate that a change to the design is 
required or desirable. In such cases, trade-off studies will probably be needed to determine if a solution 
can be found that is effective (affordability is addressed in a cost-benefit analysis - see paragraph 5-7c). 

(2) RCM and mandatory design changes. When the RCM analysis shows that two or more PM 
tasks are applicable, trade-off studies will be needed to determine which task(s) is (are) most effective. 
Of course, when a specific failure has hazardous effects, redesign is mandatory if no PM tasks are 
effective and applicable. 

b. Operational analysis. To determine if a specific failure has operational effects (but no hazardous 
effects), an analysis of the operational concept is necessary. This analysis addresses the impact of a given 
failure on measures of operational performance. The measures are a function of the type of product and 
how that product is used. For the airline industry, for example, the cost of an operational failure includes 
lost revenue, potential penalties (in the form of compensation to passengers), loss of customer confidence 
and loyalty, and the cost of fixing the failure. For a military organization that operates aircraft, the costs 
might include a decrease in readiness, the inability to fulfill a mission, the cost of reassigning another 
aircraft to replace the original aircraft, and the cost to fix the failure. For a commercial company, the cost 
of an operational failure of a product could include the loss of customer confidence and loyalty, the cost 
of repair under warranty, and possible claims by the customer for lost revenue or other non-hazardous 
effects of the failure. 

c. Cost-benefit analysis. Another type of analysis frequently used whenever one of two or more 
alternatives (design A vs. design B, task 1 vs. task 2, process I vs. process II, etc.) must be selected is a 
cost-benefit analysis (CBA). 

(1) Potential benefits. In a CBA, the potential life-cycle benefits of and life-cycle costs to 
implement a given alternative are compared with those of the other alternatives. One of the most difficult 
steps in a CBA is finding a common basis for comparison. That basis is almost always dollars, since the 
costs of implementing a choice can almost always be directly measured in terms of dollars. Some of the 
benefits of an alternative may be intangible. However, it may be possible to attach a dollar value to even 
these benefits. Benefits to which a dollar value cannot be assigned should be evaluated and assigned 
relative numeric values for comparison purposes. For example, a maximum benefit could be assigned a 
value of 5, an average benefit a value of 3, and a minimum benefit a value of 1. Evaluating and 
comparing benefits that have both dollar values and relative numeric values requires extra effort, but it 
allows all benefits to be considered in the analysis. 

(2) Costs. In a simple CBA, the annual costs of implementing each alternative design change, for 
example, are estimated. For this purpose, the analyst would sum up the estimates of the costs shown in 
table 5-14. The analyst would estimate the annual benefits of the first alternative and then repeat this 
process for each of the other alternative design. 

Table 5-14. Typical costs considered in cost-benefit analysis 



The cost of the labor hours needed to develop the design 
The cost of any additional testing required 
Any differences in materials costs 
Changes in manufacturing costs 
Additional costs due to changes in schedule 
Other costs 
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(3) Conversion. The analyst must convert the annual estimates to a common unit of measurement to 
properly compare competing alternatives. This conversion is done by discounting future dollar values, 
which transforms future benefits and costs to their "present value." The present value (also referred to as 
the discounted value) of a future amount is calculated using equation 4. 



PV = FV/(1 + i) n Equation 4 



where: 



PV = Present Value 

FV = Future Value 

i = Interest rate per period 

n = Number of compounding periods 

(4) Comparison. When the costs and benefits for each competing alternative have been discounted, 
the analyst compares and ranks the discounted net value (discounted benefit minus discounted cost) of the 
competing alternatives. In the ideal case one alternative will have the lowest discounted cost and provide 
the highest discounted benefits - it clearly would be the best alternative. More often, however, the choice 
is not so clear-cut, and other techniques must be used to determine which alternative is best. 

(5) Dollar values. Earlier, it was mentioned that some benefits may not quantifiable in terms of 
dollars and may have relative numeric values assigned for comparison purposes. In those cases, these 
numeric values can be used as tie breakers if the cost figures do not show a clear winner among the 
competing alternatives, and if the non-quantifiable benefits are not key factors. If they are key factors, the 
quantified benefits can be converted to scaled numeric values consistent with the non- quantifiable 
benefits. The evaluation then consists of comparing the discounted costs and the relative values of the 
benefits for each alternative. When the alternative with the lowest discounted cost provides the highest 
relative benefits, it is clearly the best alternative (the same basic rule used when you have discounted 
benefits). If that is not the case, the evaluation is more complex. 

(6) Numerical values. Finally, if no benefits have dollar values, numerical values can be assigned 
(using some relative scale) to each benefit for each competing alternative. The evaluation and ranking are 
then completed in the manner described in the previous paragraph. 

(7) Sensitivity analysis. Sensitivity analysis can be used to test the sensitivity and reliability of the 
results obtained from a CBA. For more information on conducting a CBA and related analysis, see the 
references in appendix A. 
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CHAPTER 6 
CONTRACTING FOR MAINTENANCE 



6-1. Introduction to maintenance contracting 

Over the past several years, the Department of Defense and the Military Services have made a concerted 
effort to outsource functions that are not inherently governmental. These functions are referred to as 
commercial activities. Although disagreements arise in defining what is not inherently a government 
function, most agree that there are difficulties and challenges in successfully outsourcing any function 
traditionally performed by the military. Among these are determining the approach for C4ISR facilities, 
how best to measure contractor performance, how best to monitor performance, the scope of the contract, 
and the benefits of including contractual incentives. 

a. Background. In the federal government, outsourcing refers to the policy of the government not to 
compete for work that can be performed by the private sector, unless the government performed the work 
previously and the government has proven to be the more economical provider. Work that can be 
performed by the private sector is commonly referred to as a commercial activity. In the federal 
government, outsourcing decisions are made based on inventories of people who perform commercial 
activities. In that respect, the competition between government and the private sector for commercial 
activities, or outsourcing, is not a new concept; it has been around for well over 30 years. 

b. The Reason for outsourcing. In light of declining defense budgets, efforts have been made to 
decrease funds supporting infrastructure and to increase budgetary support for acquisition and 
maintenance of the fleet. This has been referred to as increasing the "Tooth to Tail Ratio." Studies by the 
Center for Naval Analysis and the Defense Science Board suggest that cost savings of 30 percent should 
be possible by outsourcing. Dr. Paul G. Kaminski, former Under Secretary of Defense for Acquisition 
and Technology, described outsourcing as having four distinct benefits. 

(1) Fosters competition. Outsourcing can introduce competitive forces, which drive organizations 
to improve quality, increase efficiency, reduce costs, and better focus on their customer's needs over time. 
For DoD, competition can lead to more rapid delivery of better products and services to the warfighter, 
thereby increasing readiness. 

(2) Can enhance management flexibility. Outsourcing provides commanders with the flexibility to 
determine the appropriate size and composition of the resources needed to complete tasks over time as the 
situation changes. 

(3) Outsourcing takes advantage of economies of scale and specialization. Organizations that 
specialize in specific services generate a relatively larger business volume, which allows them to take 
advantage of scale economies. Often, these economies of scale mean that specialized service firms can 
operate and maintain state-of-the-art systems more cost-effectively than other firms or the government. 
Outsourcing to such firms provides a means for the government to take advantage of technologies and 
systems that the government itself cannot acquire or operate economically. 

(4) Fosters better management focus. In recent years, the nation's most successful companies have 
focused intensively on their core competencies — those activities that give them a competitive edge — and 
outsourced support activities. The activities that have been outsourced remain important to success, but 
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are not at the heart of the organization's mission. Business analysts frequently highlight the fact that the 
attention of an organization's leaders is a scarce resource that should be allocated wisely. This 
observation is equally true for the Department of Defense and the military services. 

c. Inherently governmental function. A function so intimately related to the public interest as to 
mandate performance by Government employees. Consistent with the definitions provided in the Federal 
Activities Inventory Reform Act of 1998 and OFPP Policy Letter 92-1, these functions include those 
activities that require either the exercise of discretion in applying Government authority or the use of 
value judgment in making decisions for the Government. Services or products in support of inherently 
Governmental functions. Inherently Governmental functions normally fall into two categories: The act 
of governing; i.e., the discretionary exercise of Government authority, and monetary transactions and 
entitlements. (Excerpted from OMB Circular A-76). 

d. OMB Circular A-76. Office of Management and Budget (OMB) Circular A-76, an executive order 
referred to as A-76, directs the Executive Branch of the government to inventory and schedule for 
competition all commercial activities. By 1989, the process, which frequently took up to five years to 
complete and contributed little to overall savings, fell out of practice. In January 1997, facing a declining 
budget, CNO identified 10,665 in-house positions and 146 in-house activities that would be required to 
compete with the private sector. In January 1998 another 7,227 positions and 137 activities were 
announced, with the total for the fiscal year expected to reach 15,000 positions. 

6-2. Approach for C4ISR facilities 

Before committing to outsourcing C4ISR facility maintenance, the responsible manager must make the 
following determinations. 

a. Determine private sector capability. Determine if private sector firms are able to perform the 
maintenance and meet the C4ISR facility mission. DoD will not consider outsourcing activities that 
constitute its core capabilities (i.e., those considered by DoD and military leaders as essential to being 
prepared to carry out the Department's warfighting mission). 

b. Determine competitive environment. Determine if a competitive commercial market exists for the 
C4ISR facility maintenance. DoD will gain from outsourcing and competition when there is an incentive 
for continuous service improvement. 

c. Determine economic benefit. Determine if outsourcing the facility maintenance results in best value 
for the government and therefore the US taxpayer. Activities will be considered for outsourcing only 
when the private sector can improve performance or lower costs in the context of long-term competition. 

6-3. Measures of performance 

When maintenance is outsourced, the first question is how to measure performance. To determine the 
"best" measure, one must first determine the requirements of the system in question. In the case of C4ISR 
facilities, providing power and environmental control for mission-critical equipment is the primary 
requirement. Furthermore, C4ISR facilities must provide these functions, for the most part, on a 24 hour 
per day, 365 day per year basis. That is, high availability is absolutely essential. Given that essential 
requirement, one of the measures for contractor maintenance should be derived from availability. The 
other should be based on economic considerations. 

a. Availability-related requirement. Even with adequate redundancy, system failures will occur. The 
number of system failures will, of course, be determined by the reliability of all components and 
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equipment, use of redundancy, effectiveness of maintenance, and so forth. When a failure does occur, the 
job of maintenance is to restore the system to full operation as quickly as possible. Three such measures 
are maximum downtime, maximum time to restore system, and turn around time. 

(1) Maximum downtime. Specifying the maximum downtime (MDT) is specifically intended to 
limit the periods of non-operation. A stated period of operation must be stipulated for a MDT 
requirement. For facilities, the requirement would normally be stated for each year of operation (i.e., 
MDT shall not exceed 150 hours in any year). 

(2) Maximum time to restore system. Related to MDT is Mean Time to Restore (MTTRS). MTTRS 
relates to the maximum time it will take to restore the system from any one failure event. In other words, 
although the previously stated example of a 150-hour MDT requirement limits the downtime over a one- 
year period, it is statistically possible for one failure event to take 50, 75, or even 100 hours to correct. 
Such a long downtime, even though it may occur only once or twice a year, is usually unacceptable. 
MTTRS limits the downtime that results from any single system failure. 

(3) Turn around time. Only a limited number of spares can be bought, especially at the equipment 
or "box" level. Consequently, when a failed piece of equipment must be removed and replaced at the 
facility (organizational) level and repaired at a field or depot level, the length of time it takes to return the 
equipment to the spares supply is important. The shorter the turn around time (TAT), the fewer the 
number of spares that need be purchased, all other factors remaining constant. Usually we are concerned 
about the average and maximum TAT. 

b. Economic requirement. Given fiscal realities and limited funding, economic considerations are also 
important. It is assumed that the contractor who can demonstrate in the proposal that they can provide the 
stipulated maintenance at the required level of performance at the lowest cost will be awarded the 
contract. "Cost" should be more than the price of the contract. The overall life cycle costs that will be 
incurred over the life of the contract should be considered. 

6-4. Scope of the contract 

Providing maintenance support requires labor, parts, spare units, consumables (such as lubrication oil and 
hydraulic fluid, clean-up materials such as rags and absorbent materials to soak up oil spills), test and 
diagnostics equipment, maintenance manuals, and much more. In developing the statement of work for 
outsourcing maintenance of a C4ISR facility, decisions must be made as to what the contractor will 
furnish and what the government will furnish. This process of allocation must be done with care to avoid 
unpleasant surprises after contract signing. An example of the level of detail required for this allocation is 
ordering of national stock numbered items. Will the contractor directly order these parts from DLA and, 
if so, will the contractor be given the necessary authority to do so? On the other hand, the contractor may 
be required to order such parts through a local government supply office. Whichever approach is taken, it 
must be reflected in the scope of the contract. 

6-5. Monitoring performance 

Once a contract for contractor maintenance support is awarded, it is essential that responsible government 
managers provide adequate level of technical oversight over the contractor's performance in executing the 
contract. Tracking the administrative details of the contract is not included - the contracts office that 
issued the contract is responsible for this tracking. Instead, technical oversight ensures that the end 
customer and the customer's mission are being adequately served, within the scope of the contract. 
Trending is important in this regard, so that potential problems are addressed before the customer and 
mission are negatively affected. 
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6-6. Incentives 



Incentives are often used to motivate contractors to achieve some level of performance above the 
contractually required minimum. Such incentives are often used on construction projects to keep the 
construction time to a minimum. Incentives can be positive or negative. 

a. Positive incentives. A positive incentive is one involving rewards. If the contractor exceeds the 
minimum levels of performance, a monetary reward is paid. Examples of exceeding the minimum level 
of performance are listed in table 6-1. 

Table 6-1. Examples of positive incentives. 



Minimum Level of 
Performance 


Reward Level 


Typical Reward 


Complete construction 
within 16 weeks 


Complete construction 
in 15 months or less 


Bonus of x% of contract value for each week 
early up to a maximum of y% 


Maximum downtime of 
150 hours in any 1-year 
period 


Downtime does not 
exceed 140 hours* 


Bonus of x% of one year contract value for each 
15-hour reduction in downtime below 140 hours 


Maximum TAT <30 
calendar days 


Maximum TAT <25 
calendar days* 


Bonus of x% for each day reduction in 
maximum TAT achieved over a six month 
period 



*Allows for normal statistical variation in downtime. 

b. Negative incentives. A negative incentive is a penalty imposed for failing to meet a contractual 
requirement. It is rare that some kind of penalty is not imposed whenever a contractual requirement is not 
met. However, the type of negative incentive intended here is one related to a specific performance 
requirement, such as availability. The objective of a negative incentive is similar to that of a positive 
incentive, in that both will hopefully ensure that the performance requirements in question are met. 
However, the negative incentive provides no motivation for exceeding the requirement. Moreover, 
experts debate whether or not a negative incentive is as effective as a positive one. 
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APPENDIX B 

STATISTICAL DISTRIBUTION USED IN RELIABILITY AND 

MAINTAINABILITY 



B-l. Introduction to statistical distribution 

Many statistical distributions are used to model various reliability and maintainability parameters. The 
particular distribution used depends on the nature of the data being analyzed. 

a. Exponential and Weibull. These two distributions are commonly used for reliability modeling - the 
exponential is used because of its simplicity and because it has been shown in many cases to fit electronic 
equipment failure data, and the Weibull because it consists of a family of different distributions that can 
be used to fit a wide variety of data and it models wearout (i.e., an increasing hazard function). 

b. Normal and lognormal. Although also used to model reliability, the normal and lognormal distribu- 
tions are more often used to model repair times. In this application, the normal is most applicable to sim- 
ple maintenance tasks that consistently require a fixed amount of time to complete with little variation. 
The lognormal is applicable to maintenance tasks where the task time and frequency vary, which is often 
the case for complex systems and products. 

B-2. The exponential distribution 

The exponential distribution is widely used to model electronic reliability failures in the operating domain 
that tend to exhibit a constant failure rate. To fail exponentially means that the distribution of failure 
times fits the exponential distribution as shown in table B-l. The characteristics of the exponential distri- 
bution are listed in table B-2. figure B-l shows the exponential pdf for varying values of A. . 

Table B-l. Summary of the exponential distribution 



Probability Density Function 



Reliability Function 



Hazard Function 



f(t) =ke"^ 




Time 




h(t) = f(t)/R(t) = Xe' Xt le' n = X 



h(t) 



/ 



h(t) = X 



Time 



Time 
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Table B-2. Characteristics of the exponential distribution 



It has a single parameter, X, which is the mean. For reliability applications, X called the failure 

rate. 

X, the failure rate, is a constant. If an item has survived for t hours, the chance of it failing during 

the next hour is the same as if it had just been placed in service. 

The mean-time-between-failure (MTBF) = 1/ X. 

The mean of the distribution occurs at about the 63 rd percentile. Thus, if an item with a 1000-hour 

MTBF had to operate continuously for 1000 hours, the probability of success (survival) would be 

only 37%. 



f(t) 




X=0003 



^=.0002 



^=.0001 



4 ■>---- ^ "4" " " " " " 4" 
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Time to Failure 

Figure B-l. The exponential pdffor varying values of A . 

B-3. The Weibull distribution 

The Weibull distribution is an important distribution because it can be used to represent many different 
pdfs; therefore, it has many applications. The characteristics of the Weibull are shown in table B-3. The 
distribution is described in table B-4. Figure B-2 shows the 2-parameter Weibull pdf for different values 
of p and a given value of r\ . 
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Table B-3. Characteristics of the Weibull distribution 



It has 2 (P and r|) or 3 (P, r\, and y) parameters. 

- The shape parameter, (3, describes the shape of the pdf. 

- The scale parameter, r\, is the 63 rd percentile value of the distribution and is called the charac- 

teristic life. In some texts, 9 is used as the symbol for the characteristic life. 

- The location parameter, y, is the value that represents a failure-free or prior use period for the 

item. If there is no prior use or period where the probability of failure is zero, then y = and 

the Weibull distribution becomes 2-parameter distribution. 
P, r|, and y can be estimated using Weibull probability paper or software programs. 
When p = 1 and y = 0, the Weibull is exactly equivalent to the exponential distribution. 
When p = 3.44, the Weibull closely approximates the normal distribution. 



Table B-4. Summary of the Weibull distribution 



Probability Density Function 



Reliability Function 



Hazard Function 
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Figure B-2. The two-parameter Weibull pdf for different values of /3 and a given value of rj. 
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B-4. The normal distribution 



The pdf of the Normal distribution is often called the bell curve because of its distinctive shape. The 
Normal distribution is described in table B-5. The characteristics of the Normal distribution are shown in 
table B-6. Figure B-3 shows the normal pdf for different values of a and a fixed value of p. 

Table B-5. Summary of the normal distribution 



Probability Density Function 


Reliability Function 


Hazard Function 


1 2a 2 
fff> - p 


» oo 
R(t) = J t f(t) dt 


h(t)= R(t) 


r W , 



Table B-6. Characteristics of the normal distribution 



It has two parameters: 

- The mean, p, is the 50 th percentile of the distribution. The distribution is symmetrical around the 
mean. 

- The standard deviation, a, is a measure of the amount of spread in the distribution. 

If t has the pdf defined in figure B-5 and p. = and a = 1, then t is said to have a standardized normal 
distribution. 

The integral of a distribution's pdf is its cumulative distribution function, used to derive the reliability 
function. The integral of the normal pdf cannot be evaluated using the Fundamental Theorem of 
Calculus because we cannot find a function for which the derivative equals exp(-x 2 /2). However, 
numerical integration methods have been used to evaluate the integral and tabulate values for the 
standard normal distribution. 
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Figure B-3. The normal pdf for varying values of a and a fixed /j. 
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B-5. The lognormal distribution 

The lognormal distribution is summarized in table B-7. The characteristics of the lognormal distribution 
are shown in table B-8. Figure B-4 shows the distribution for different values of li and a. 

Table B-7. Summary of the lognormal distribution 



Probability Density Function 


Reliability Function 


Hazard Function 


-Qnt-^i) 2 

1 i * 
2 o 

fCt 1 * - p 


» oo 
R(t) = J t f(t) dt 


h(t)= R(t) 


at 2% 



Table B-8. Characteristics of the lognormal distribution 



It has two parameters: 

- The mean, jx. Unlike the mean of the Normal distribution, the mean of the lognormal is not the 
50 th percentile of the distribution and the distribution is not symmetrical around the mean. 

- The standard deviation, a. 

The logarithms of the measurements of the parameter of interest (e.g., time to failure, time to repair) 
are normally distributed. 



f(t) 




Time to Failure 



Figure B-4. The lognormal pdffor different values of /j and a. 
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APPENDIX C 



AVAILABILITY AND OPERATIONAL READINESS 



C-l. Availability 

In general, availability is the ability of a product or service to be ready for use when a customer wants to 
use it. That is, it is available if it is in the customer's possession and works when it's turned on or used. A 
product that's "in the shop" or is in the customer's possession but doesn't work is not available. Measures 
of availability are shown in table C-l. 



Table C-l. Quantitative measures of availability 



Measure 


Equation 


Description 


Inherent 

Availability: A; 


MTBF 


• Where MTBF is the mean time between failure and MTTR is 
the mean time to repair 

• A probabilistic measure 

• Reflects the instantaneous probability that a component will be 
up. Ai considers only downtime for repair due to failures. No 
logistics delay time, preventative maintenance, etc. is included. 


MTBF + MTTR 


Operational Availability: 

A 


MTBM 


• Where MTBM is the mean time between maintenance 
(preventive and corrective) and MDT is the mean downtime, 
which includes MTTR and all other time involved with 
downtime such as logistic delays 

• A probabilistic measure 

• Similar to inherent availability but includes ALL downtime. 
Included is downtime for corrective maintenance and 
preventative maintenance, including any logistics delay time. 


MTBM + MDT 



MTBF = Mean Time Between Failure 
MDT = Mean Downtime 



MTBM = Mean Time Between Maintenance 
MTTR = Mean Time to Repair (corrective only) 



a. Nature of the equations. Note that the equations are time independent and probabilistic in nature. 
The value of availability yielded by each equation is the same whether the period of performance being 
considered is 1 hour or a year. 

b. Derivation of steady state equation for availability. The equations in table C-l are steady state 
equations. The equation for inherent availability (equation C-l), for example, is the steady state equation 
derived from equation C-2, as time approaches infinity: 



Ai 



MTBF 



MTBF + MTTR 



Equation C-l 



A } = 



MTBF 



MTTR 



P 

I MTBF MTTR 



MTBF + MTTR MTBF + MTTR 



Equation C-2 
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1. Equation C-l represents a limit for inherent availability. It represents the long-term proportion of 
time that a system will be operational. 

2. Assuming that the times to failure and time to repair are both exponentially distributed, with rates A. 
and [i, respectively, equation C-l can be expressed as: 



A. =■ 



1 



M 



1 1 \x + X 



Equation C-3 



3. The derivation of equation C-l now follows. A simple Markov model is used to evaluate 
availability. The probabilities of being in either the up state or the down state are determined using the 
Laplace transform. The model and equations are: 




dPupW 



dt 



-APup(t) + |iPDown(t) 



sLup(s) -Pup(O) = sLup(s) -1 = - A,L Up (s) + |jL Down (s) 

1- sLup(s) = sLoown (s) = A,Lup(s) - LlLoownCs) 

l + /" L Down( s ) 



From equation C-4, L Up (s) 



From equation C-5, L Down (s) 



s + /L 



AL Up (s) 



S + JU 



4. Substituting the expression for L Down (s) into equation C- 7, 

Lup( s ) 



1 - + - " 



s + ju + A s(s + 2, + jU) 



Equation C-4 

Equation C-5 
Equation C-6 

Equation C-7 
Equation C-8 



Equation C-9 
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5. Then, availability = the inverse of the Laplace transform for L Up (s). To obtain the inverse, 

1 n 1 [ //(s + // + Aj+As 



; + // + /l s[s + A + /i) X + n\ s(s + // + a) 



A + /j 
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LI 1 X 1 

-X-H x- 
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^ I OT e- st dt 
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? £V (s + " + A)t dt 



jU + A 
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f 00 c" st f M I - c ' iM + A)t 



2 + /^ /^+/l 



dt 



/" 



2 p -U + /j> 



// + A A + // 
6. Taking the limit of equation C-10 as t approaches infinity 



Ai 



x0 = 



/j + A, X + jj, X + ju 



Equation C-10 



A, 



MTBF 



MTBF + MTTR 



Q.E.D. 



C-2. Operational readiness 



Closely related to the concept of operational availability but broader in scope is operational readiness. 
Operational readiness is defined as the ability of a military unit to respond to its operational plans upon 
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receipt of an operations order. It is, therefore, a function not only of the product availability, but also of 
assigned numbers of operating and maintenance personnel, the supply, the adequacy of training, and so 
forth. 

a. Readiness in the commercial world. Although operational readiness has traditionally been a military 
term, it is equally applicable in the commercial world. For example, a manufacturer may have designed 
and is capable of making very reliable, maintainable products. What if he has a poor distribution and 
transportation system or does not provide the service or stock the parts needed by customers to effectively 
use the product? Then, the readiness of this manufacturer to go to market with the product is low. 

b. Relationship of availability and operational readiness. The concepts of availability and operational 
readiness are obviously related. Important to note, however, is that while the inherent design 
characteristics of a product totally determine inherent availability, other factors influence operational 
availability and operational readiness. The reliability and maintainability engineers directly influence the 
design of the product. Together, they can affect other factors by providing logistics planners with the 
information needed to identify required personnel, spares, and other resources. This information includes 
the identification of maintenance tasks, repair procedures, and needed support equipment. 
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APPENDIX D 
ACRONYMS 



D-l. ACRONYMS 

A, 

A 

CBM 

CM 

CND 

FEA 

FMEA 

FMECA 

FD 

FI 

FTA 

LCC 

LRU 

MA 

MTBF 

MTBM 

MTTR 

NDE 

NDI 

O&S 

PM 



Availability, Inherent 

Availability, Operational 

Condition-based Maintenance 

Corrective Maintenance 

Cannot Duplicate 

Finite Element Analysis 

Failure Modes and Effects Analysis 

Failure Modes, Effects, and Criticality Analysis 

Fault Detection 

Fault Isolation 

Fault Tree Analysis 

Life Cycle Cost 

Line Replaceable Unit 

Maintenance Action 

Mean Time Between Failure 

Mean Time Between Maintenance 

Mean Time To Repair 

Nondestructive Evaluation 

Nondestructive Inspection 

Operating and Support 

Preventive Maintenance 
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pdf 


Probability Density Function 


R&R 


Remove and Replace 


RBD 


Reliability Block Diagram 


RCM 


Reliability-Centered Maintenance 


RTOK 


Retest OK 


R&M 


Reliability and Maintainability 


TAT 


Turn Around Time 
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APPENDIX E 
Flexible Methodology Example 



Definition of Detection Method, Failure Occurrence, and Severity Levels for Flexible RCM 
Analysis 

It should be understood that, from an overall perspective, flexible analysis approach focuses on subsystem 
and system level function loss. This is different from traditional analysis techniques, which focus on 
specific failure modes for specific components. Focus on the subsystem and system level allows RCM 
analysis to be conducted despite the absence of complete component level information. 

Detection Method 

As stated in the Ground Rules and Assumptions section, when system controls, automation 
configurations, and system safeguards are unknown, Detection Method Level can be assumed to be 1. 
This assumes and stresses that, for a mission critical facility, all item and system level function losses 
should and will be apparent. 

Although this is an acceptable approach for initial analysis, and demonstration purposes, it should be 
understood that the presence, or absence, of detection method in a systems has a direct effect on the risk 
associated with the operation of that system. Therefore, consideration of detection method will provide 
more accurate and resolute analysis results and recommendations. Furthermore, an understanding of 
current detection method provisions, along with results of an analysis which considered detection method 
and component level failure modes, can and should be utilized to make recommendations on future 
detection method provisions. 

Occurrence 

Equipment specific PREP database availability numbers will provide indication of failure frequency. 
These metrics will help to provide less subjective item and system risk assessments. However, they must 
be adjusted to account for system redundancy, and ranked into discrete occurrence levels to be used in 
qualitative equipment criticality calculations. 
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By design and purpose, a redundant system is more reliable and less vulnerable than a single point, with 
respect to system function and mission requirements. Therefore, the occurrence level for a single point 
function must be weighted to reflect the operation, presumed reliability, and severity of loss of function of 
the redundant component system as accurately as possible. 

The following formula is used to calculate the adjusted availability of a given subsystem due to a level of 
component or subsystem redundancy. 



a" -I 



n! 



k-m 



k\(n-k)\ 



(A)'(l-A) 



(n-k) 



Where: 

At = Initial inherent component availability 

At = Adjusted redundant component availability level 

m = Minimum number of components needed 

n = Number of components available 

k = Current component in redundant system being analyzed 

With availability metrics representative of system configuration now available, component availability is 
ranked to provide discrete subsystem occurrence levels, as follows: 



Availability (nines) 


Occurrence Rank 


Occurrence Description 


> 0.999999999 


1 


Almost Never 


0.99999999 


2 


Remote 


0.9999999 


3 


Very Slight 


0.999999 


4 


Slight 


0.99999 


5 


Low 


0.9999 


6 


Medium 


0.999 


7 


Moderately High 


0.99 


8 


High 


0.9 


9 


Very High 





10 


Almost Certain 



Severity 

It is also important to consider the concept of failure severity. Severity pertains to and ranks the 
consequences of system level failure mode effects. For example, a highly probable failure may occur for a 
subsystem of a piece of critical equipment without severe consequences. 
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Severity rankings used are as follows: 



Ranking 


Effect 


Comment 


1 


None 


No reason to expect failure to have any effect on 
Safety, Health, Environment or Mission 


2 


Very Low 


Minor disruption to mission. 


3 


Low 


Minor disruption to mission. 


4 


Low to 
Moderate 


Moderate disruption to mission. 


5 


Moderate 


Moderate disruption to mission. 


6 


Moderate to 
High 


Moderate disruption to mission. 


7 


High 


High disruption to mission. 


8 


Very High 


High disruption to mission. 


9 


Hazard 


Extremely high disruption to mission 


10 


Hazard 


Extremely high disruption to mission. 



RPN Calculations and Ranking Methods for Flexible Analysis 

Severity, occurrence, and detection method levels are then utilized to produce a subsystem risk 
assessment as follows: 

RPN=O.SxD 

Where: 

RPN = Risk associated with failure mode (Risk Priority Number) 

S = Severity level for failure mode 

O = Occurrence level for failure mode 

D = Detection method level (1) 

This calculation will be performed for every subsystem item in the master equipment listing. With this 
information, Risk Priority Numbers for sub-systems and systems can be obtained as follows: 



RPNs = J] (RPNc)n 



(1=1 
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Facility 

Identifie 

r 


Equipme 
nt Type 


Parent 

Syste 

m 


M 


N 


PRE 
P ID 


A 


A 1 


O 1 

Ranke 

d 


S 


R 
P 
N 


A-1 


A 


X 


1 


2 


13 


0.99998892 

4 


0.999999999 
9 




9 


9 


A-2 


A 


X 


1 


2 


13 


0.99998892 

4 


0.999999999 
9 




9 


9 


B-1 


B 


X 


1 


4 


163 


0.99999365 
4 


1.000000000 
00 




9 


9 


B-2 


B 


X 


1 


4 


163 


0.99999365 

4 


1.000000000 
00 




9 


9 


B-3 


B 


X 


1 


4 


163 


0.99999365 

4 


1.000000000 
00 




9 


9 


B-4 


B 


X 


1 


4 


163 


0.99999365 
4 


1.000000000 
00 




9 


9 



Syste 
m RPN 



5 

4 



Where: 

RPN S = Risk Priority Number for the current system being analyzed 

RPN C = Risk Priority Number for the current subsystem 

n = The current subsystem being analyzed 

j = Total number of components in the sub-system or system 

Results - System X 

Item and system risk assessments can now be utilized to apply RCM decision logic (see figure 5.2), and 
to build maintenance tasking program. Items and systems assessed to be of high operational risk should, 
especially, be applied to the decision logic and should receive high levels of maintenance focus. Items 
having extremely low operation risk will receive low levels of maintenance focus, and may be allowed to 
run to failure. 
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GLOSSARY 



ACTIVE TIME: That time during which an item is in an operational inventory. 

AFFORD ABILITY: Affordability is a measure of how well customers can afford to purchase, operate, 
and maintain a product over its planned service life. Affordability is a function of product value and 
product costs. It is the result of a balanced design in which long-term support costs are considered 
equally with near-term development and manufacturing costs. 

ALIGNMENT: Performing the adjustments that are necessary to return an item to specified operation. 
AVAILABILITY: The instantaneous probability that a component will be up. 

AVAILABILITY, INHERENT (Aj). The instantaneous probability that a component will be up. Ai 

considers only downtime for repair due to failures. No logistics delay time, preventative maintenance, 
etc. is included. 

AVAILABILITY, OPERATIONAL (Ao). Ao is the instantaneous probability that a component will 
be up but differs from inherent availability in that it includes ALL downtime. Included is downtime for 
both corrective maintenance and preventative maintenance, including any logistics delay time.. 

-C- 

CALIBRATION: A comparison of a measuring device with a known standard and a subsequent adjust- 
ment to eliminate any differences. Not to be confused with alignment. 

CANNOT DUPLICATE (CND): A situation when a failure has been noted by the operator but cannot 
be duplicated by maintenance personnel attempting to correct the problem. Also see Retest OK. 

CHECKOUT: Tests or observations of an item to determine its condition or status. 

COMPONENT. A piece of electrical or mechanical equipment viewed as an entity for the purpose of 
reliability evaluation 

CONDITION-BASED PM: Maintenance performed to assess an item's condition and performed as a 
result of that assessment. Some texts use terms such as predictive maintenance and on-condition. The 
definition of condition-based PM used herein includes these concepts. In summary, the objectives of 
condition-based PM are to first evaluate the condition of an item, then, based on the condition, either de- 
termine if a hidden failure has occurred or determine if a failure is imminent, and then take appropriate 
action. Maintenance that is required to correct a hidden failure is, of course, corrective maintenance. 

CORRECTIVE ACTION: A documented design, process, procedure, or materials change implemented 
and validated to correct the cause of failure or design deficiency. 
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CORRECTIVE MAINTENANCE (CM): All actions performed as a result of failure, to restore an 
item to a specified condition. Corrective maintenance can include any or all of the following steps: Lo- 
calization, Isolation, Disassembly, Interchange, Reassembly, Alignment and Checkout. 

COST: The expenditure of resources (usually expressed in monetary units) necessary to develop, ac- 
quire, or use a product over some defined period of time. 

-D- 

DEPENDABILITY: A measure of the degree to which an item is operable and capable of performing its 
required function at any (random) time during a specified mission profile, given item availability at the 
start of the mission. (Item state during a mission includes the combined effects of the mission-related 
system R&M parameters but excludes non-mission time; see availability). 

DETECTABLE FAILURE: Failures at the component, equipment, subsystem, or system (product) 
level that can lie identified through periodic testing or revealed by an alarm or an indication of an anom- 
aly. 

DIAGNOSTICS: The hardware, software, or other documented means used to determine that a malfunc- 
tion has occurred and to isolate the cause of the malfunction. Also refers to "the action of detecting and 
isolating failures or faults." 

DOWNTIME: That element of time during which an item is in an operational inventory but is not in 
condition to perform its required function. 

-E- 

EFFECTIVENESS: The degree to which PM can provide a quantitative indication of an impending 
functional failure, reduce the frequency with which a functional failure occurs, or prevent a functional 
failure. 

EQUIPMENT: A general term designating an item or group of items capable of performing a complete 
function. 

-F- 

FAILURE (i). The termination of the ability of a component or system to perform a required function. 

FAILURE, CATASTROPHIC: A failure that causes loss of the item, human life, or serious collateral 
damage to property. 

FAILURE, HIDDEN: A failure that is not evident to the operator; that is, it is not a functional failure. 
A hidden failure may occur in two different ways. In the first, the item that has failed is one of two or 
more redundant items performing a given function. The loss of one or more of these items does not result 
in a loss of the function. The second way in which a hidden failure can occur is when the function per- 
formed by the item is normally inactive. Only when the function is eventually required will the failure 
become evident to the operator. Hidden failures must be detected by maintenance personnel. 

FAILURE, INTERMITTENT: Failure for a limited period of time, followed by the item's recovery of 
its ability to perform within specified limits without any remedial action. 
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FAILURE, RANDOM: A failure, the occurrence of which cannot be predicted except in a probabilistic 
or statistical sense. 

FAILURE ANALYSIS: Subsequent to a failure, the logical systematic examination of an item, its con- 
struction, application, and documentation to identify the failure mode and determine the failure mecha- 
nism and its basic course. 

FAILURE EFFECT: The consequence(s) a failure mode has on the operation, function, or status of an 
item. Failure effects are typically classified as local, next higher level, and end. 

FAILURE MECHANISM: The physical, chemical, electrical, thermal or other process which results in 
failure. 

FAILURE MODE: The way in which a failure is observed, describes the way the failure occurs, ie., 
short, open, fracture and excessive wear.. 

FAILURE MODE AND EFFECTS ANALYSIS (FMEA): A procedure by which each potential fail- 
ure mode in a product (system) is analyzed to determine the results or effects thereof on the product and 
to classify each potential failure mode according to its severity or risk probability number. 

FAILURE MODES, EFFECTS, AND CRITICALITY ANALYSIS (FMECA): The term is used to 
emphasize the classifying of failure modes as to their severity (criticality). 

FAILURE RATE (k): The mean (arithmetic average, also known as the forced outage rate) number of 
failures of a component and/or system per unit exposure time. The most common unit in reliability analy- 
ses is hours (h). However, some industries use failures per year (f/y) which is denoted by the symbol (Ay). 

FAILURE REPORTING AND CORRECTIVE ACTION SYSTEM (FRACAS): A closed-loop sys- 
tem for collecting, analyzing, and documenting failures and recording any corrective action taken to 
eliminate or reduce the probability of future such failures. 

FALSE ALARM: A fault indicated by BIT or other monitoring circuitry where no fault can be found or 
confirmed. 

FAULT: Immediate cause of failure (e.g., maladjustment, misalignment, defect, etc.). 

FAULT DETECTION (FD): A process that discovers the existence of faults. 

FAULT ISOLATION (FI): The process of determining the location of a fault to the indenture level 
necessary to affect repair. 

FAULT TREE ANALYSIS: An analysis approach in which each potential system failure is traced back 
to all faults that could cause the failure. It is a top-down approach, whereas the FMEA is a bottom-up 
approach. 

FINITE ELEMENT ANALYSIS (FEA): A modeling technique (normally a computer simulation) used 
to predict the material response or behavior of the device or item being modeled. FEA can describe mate- 
rial stresses and temperatures throughout the modeled device by simulating thermal or dynamic loading 
conditions. It can be used to assess mechanical failure mechanisms such as fatigue, rupture, creep, and 
buckling. 
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FUNCTIONAL TEST: An evaluation of a product or item while it is being operated and checked under 
limited conditions without the aid of its associated equipment in order to determine its fitness for use. 

-H- 

HIDDEN FAILURE: See Failure, Hidden. 

-I- 

ISOLATION: Determining the location of a failure to the extent possible, by the use of accessory equip- 
ment. 



LEVELS OF MAINTENANCE: The division of maintenance, based on different and requisite techni- 
cal skill, which jobs are allocated to organizations in accordance with the availability of personnel, tools, 
supplies, and the time within the organization. Typical maintenance levels are organizational, intermedi- 
ate, and depot. 

LIFE CYCLE COST (LCC): The sum of acquisition, logistics support, operating, and retirement and 
phase-out expenses. 

LIFE CYCLE PHASES: Identifiable stages in the life of a product from the development of the first 
concept to removing the product from service and disposing of it. Within the Department of Defense, 
four phases are formally defined: Concept Exploration; Program Definition and Risk Reduction; Engi- 
neering and Manufacturing Development; and Production, Deployment, and Operational Support. Al- 
though not defined as a phase, demilitarization and disposal is defined as those activities conducted at the 
end of a product's useful life. Within the commercial sector, various ways of dividing the life cycle into 
phases are used. One way of doing this is as follows: Customer Need Analysis, Design and Develop- 
ment, Production and Construction, Operation and Maintenance, and Retirement and Phase-out. 

LINE REPLACEABLE UNIT (LRU): A unit designed to be removed upon failure from a larger entity 
(product or item) in the operational environment, normally at the organizational level. 

LOCALIZATION: Determining the location of a failure to the extent possible, without using accessory 
test equipment. 

LOGISTIC DELAY TIME: That element of downtime during which no maintenance is being accom- 
plished on the item because of either supply or administrative delay. 

LOGISTICS SUPPORT: The materials and services required to enable the operating forces to operate, 
maintain, and repair the end item within the maintenance concept defined for that end item. 



-M- 
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MAINTAINABILITY: The relative ease and economy of time and resources with which an item can be 
retained in, or restored to, a specified condition when maintenance is performed by personnel having 
specified skill levels, using prescribed procedures and resources, at each prescribed level of maintenance 
and repair. Also, the probability that an item can be retained in, or restored to, a specified condition when 
maintenance is performed by personnel having specified skill levels, using prescribed procedures and re- 
sources, at each prescribed level of maintenance and repair. 

MAINTENANCE: All actions necessary for retaining an item in or restoring it to a specified condition. 

MAINTENANCE ACTION: An element of a maintenance event. One or more tasks (i.e., fault local- 
ization, fault isolation, servicing and inspection) necessary to retain an item's condition or restore it to a 
specified condition. 

MAINTENANCE CONCEPT: A description of the planned general scheme for maintenance and sup- 
port of an item in the operational environment. It provides a practical basis for design, layout, and pack- 
aging of the system and its test equipment. It establishes the scope of maintenance responsibility for each 
level of maintenance and the personnel resources required to maintain the system. 

MAINTENANCE EVENT: One or more maintenance actions required to effect corrective and preven- 
tive maintenance due to any type of failure or malfunction, false alarm or scheduled maintenance plan. 

MAINTENANCE TASK: The maintenance effort necessary for retaining an item in, or chang- 
ing/restoring it to a specified condition. 

MAINTENANCE TIME: An element of downtime that excludes modification and delay time. 

MEAN DOWNTIME (MDT). The average downtime caused by preventative and corrective mainte- 
nance, including any logistics delay time. This is synonymous with mean time to restore system 
(MTTRS) as found in some publications. 

MEAN TIME BETWEEN FAILURES (MTBF). The mean exposure time between consecutive fail- 
ures of a component. MTBF is a require measurement used for calculating inherent availability. It can be 
estimated by dividing the exposure time by the number of failures in that period. 

MEAN TIME BETWEEN MAINTENANCE (MTBM). The average time between all maintenance 

events that cause downtime, both preventative and corrective maintenance, and also includes any associ- 
ated logistics delay time. 

MEAN TIME TO REPAIR (MTTR). The mean time to replace or repair a failed component. Logistics 
time associated with the repair, such as parts acquisitions, crew mobilization, are not included. It can be 
estimated by dividing the summation of repair times by the number of repairs and, therefore, is practically 
the average repair time. The most common unit in reliability analyses is hours (h/f). 

MISSION TIME: That element of up time required to perform a stated mission profile. 

-N- 

NON-DESTRUCTIVE EVALUATION: A collective term referring to a wide range of technologies 
and methods used for nondestructive inspection, evaluation, or testing. 



G-5 



TM 5-698-2 



NON-DESTRUCTIVE INSPECTION (NDI): Any method used for inspecting an item without physi- 
cally, chemically, or otherwise destroying or changing the design characteristics of the item. However, it 
may be necessary to remove paint or other external coatings to use the NDI method. A wide range of 
technology and methods are usually described as nondestructive inspection, evaluation, or testing (collec- 
tively referred to as non-destructive evaluation or NDE). The core of NDE is commonly thought to con- 
tain ultrasonic, visual, radiographic, eddy current, liquid penetrant, and magnetic particle inspection 
methods. Other methodologies include acoustic emission, use of laser interference, microwaves, NMR 
and MRI, thermal imaging, and so forth. 

NON-DETECTABLE FAILURE: Failures at the component, equipment, subsystem, or system (prod- 
uct) level that are identifiable by analysis but cannot be identified through periodic testing or revealed by 
an alarm or an indication of an anomaly. 

-O- 

ON-CONDITION MAINTENANCE: See Condition-based PM. 

OPERATING AND SUPPORT (O&S) COSTS: Those costs associated with operating and supporting 
(i.e., using) a product after it is purchased or fielded. 

OPERATIONAL READINESS: The ability of a military unit to respond to its operation plan(s) upon 
receipt of an operations order. (A function of assigned strength, item availability, status, or supply, train- 
ing, etc.). 

-P- 

PREDICTED: That which is expected at some future time, postulated on analysis of past experience and 
tests. 

PREDICTIVE MAINTENANCE: See Condition-based PM. 

PREVENTATIVE MAINTENANCE (PM): All actions performed in an attempt to retain an item in a 
specified condition. These actions may or may not result in downtime for the component, and may or 
may not be performed on a fixed interval. 

-R- 

REASSEMBLY: Assembling the items that were removed during disassembly and closing the reassem- 
bled items. 

REDUNDANCY: The existence of more than one means for accomplishing a given function. Each 
means of accomplishing the function need not necessarily be identical. 

RELIABILITY (R(t)). The probability that a component can perform its intended function for a speci- 
fied time interval (t) under stated conditions. This calculation is based on the exponential distribution. 

RELIABILITY-CENTERED MAINTENANCE (RCM): A disciplined logic or methodology used to 
identify preventive and corrective maintenance tasks to realize the inherent reliability of equipment at a 
minimum expenditure of resources, while ensuring safe operation and use. 
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RETEST OK (RTOK): A situation where a failure was detected on the system, either through inspec- 
tion or testing, but no fault can be found in the item that was eventually removed for repair at a field or 
depot location. Also see Cannot Duplicate. 

-S- 

SCHEDULED MAINTENANCE: Periodic prescribed inspection and/or servicing of products or items 
accomplished on a calendar, mileage or hours of operation basis. Included in Preventive Maintenance. 

SERVICING: The performance of any act needed to keep an item in operating condition, (i.e. lubricat- 
ing, fueling, oiling, cleaning, etc.), but not including preventive maintenance of parts or corrective main- 
tenance tasks. 

SINGLE-POINT FAILURE: A failure of an item that causes the system to fail and for which no redun- 
dancy or alternative operational procedure exists. 

SUBSYSTEM: A combination of sets, groups, etc. that performs an operational function within a prod- 
uct (system) and is a major subdivision of the product. (Example: Data processing subsystem, guidance 
subsystem). 

SYSTEM DOWNTIME: The time interval between the commencement of work on a system (product) 
malfunction and the time when the system has been repaired and/or checked by the maintenance person, 
and no further maintenance activity is executed. 

SYSTEM: General - A composite of equipment and skills, and techniques capable of performing or sup- 
porting an operational role, or both. A complete system includes all equipment, related facilities, mate- 
rial, software, services, and personnel required for its operation and support to the degree that it can be 
considered self-sufficient in its intended operational environment. 



TESTABILITY: A design characteristic that allows status (operable, inoperable, or degraded) of an item 
to be determined and the isolation of faults within the item to be performed in a timely manner. 

TOTAL SYSTEM DOWNTIME: The time interval between the reporting of a system (product) mal- 
function and the time when the system has been repaired and/or checked by the maintenance person, and 
no further maintenance activity is executed. 

-U- 

UNSCHEDULED MAINTENANCE: Corrective maintenance performed in response to a suspected 
failure. 

UPTIME: That element of ACTIVE TIME during which an item is in condition to perform its required 
functions. (Increases availability and dependability). 

USEFUL LIFE: The number of life units from manufacture to when the item has an unrepairable failure 
or unacceptable failure rate. Also, the period of time before the failure rate increases due to wearout. 

-W- 
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WEAROUT: The process that results in an increase of the failure rate or probability of failure as the 
number of life units increases. 
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